{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"# The RMVA Inteface: TMVA and R"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Required headers"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#include \"TRInterface.h\"\n",
"#include \"TMVA/MethodC50.h\"\n",
"#include \"TMVA/MethodRSNNS.h\"\n",
"#include \"TMVA/MethodRXGB.h\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Declare Factory"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Factory : You are running ROOT Version: 6.07/07, Apr 1, 2016\n",
"--- Factory : \n",
"--- Factory : _/_/_/_/_/ _| _| _| _| _|_| \n",
"--- Factory : _/ _|_| _|_| _| _| _| _| \n",
"--- Factory : _/ _| _| _| _| _| _|_|_|_| \n",
"--- Factory : _/ _| _| _| _| _| _| \n",
"--- Factory : _/ _| _| _| _| _| \n",
"--- Factory : \n",
"--- Factory : ___________TMVA Version 4.2.1, Feb 5, 2015\n",
"--- Factory : \n"
]
}
],
"source": [
"TMVA::Tools::Instance();\n",
"\n",
"auto inputFile = TFile::Open(\"https://raw.githubusercontent.com/iml-wg/tmvatutorials/master/inputdata.root\");\n",
"auto outputFile = TFile::Open(\"TMVAOutputCV.root\", \"RECREATE\");\n",
"\n",
"TMVA::Factory factory(\"TMVAClassification\", outputFile,\n",
" \"!V:ROC:!Correlations:!Silent:Color:!DrawProgressBar:AnalysisType=Classification\" ); "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Declare DataLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"TMVA::DataLoader loader(\"dataset\");\n",
"\n",
"//adding variables to dataset\n",
"loader.AddVariable(\"var1\");\n",
"loader.AddVariable(\"var2\");\n",
"loader.AddVariable(\"var3\");\n",
"loader.AddVariable(\"var4\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up Dataset"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- DataSetInfo : Dataset[dataset] : Added class \"Signal\"\t with internal class number 0\n",
"--- dataset : Add Tree Sig of type Signal with 6000 events\n",
"--- DataSetInfo : Dataset[dataset] : Added class \"Background\"\t with internal class number 1\n",
"--- dataset : Add Tree Bkg of type Background with 6000 events\n",
"--- dataset : Preparing trees for training and testing...\n"
]
}
],
"source": [
"TTree *tsignal, *tbackground;\n",
"inputFile->GetObject(\"Sig\", tsignal);\n",
"inputFile->GetObject(\"Bkg\", tbackground);\n",
"\n",
"TCut mycuts, mycutb;\n",
" \n",
"loader.AddSignalTree (tsignal, 1); //signal weight = 1\n",
"loader.AddBackgroundTree (tbackground, 1); //background weight = 1 \n",
"\n",
"loader.PrepareTrainingAndTestTree(mycuts, mycutb,\n",
"\"nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V\"); "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Booking methods\n",
"The available Booking methods with options for RMVA are:\n",
"\n",
"- C50 Boosted Decision Trees http://oproject.org/tiki-index.php?page=RMVA#C50Booking\n",
"- RMLP Neural Networks http://oproject.org/tiki-index.php?page=RMVA#RSNNSMLP \n",
"- Extreme Gradient Boosted (RXGB) Decision Trees http://oproject.org/tiki-index.php?page=RMVA#RXGBBooking"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Factory : Booking method: \u001b[1mC50\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n",
"--- DataSetFactory : Dataset[dataset] : Splitmode is: \"RANDOM\" the mixmode is: \"SAMEASSPLITMODE\"\n",
"--- DataSetFactory : Dataset[dataset] : Create training and testing trees -- looping over class \"Signal\" ...\n",
"--- DataSetFactory : Dataset[dataset] : Weight expression for class 'Signal': \"\"\n",
"--- DataSetFactory : Dataset[dataset] : Create training and testing trees -- looping over class \"Background\" ...\n",
"--- DataSetFactory : Dataset[dataset] : Weight expression for class 'Background': \"\"\n",
"--- DataSetFactory : Dataset[dataset] : Number of events in input trees (after possible flattening of arrays):\n",
"--- DataSetFactory : Dataset[dataset] : Signal -- number of events : 6000 / sum of weights: 6000 \n",
"--- DataSetFactory : Dataset[dataset] : Background -- number of events : 6000 / sum of weights: 6000 \n",
"--- DataSetFactory : Dataset[dataset] : Signal tree -- total number of entries: 6000 \n",
"--- DataSetFactory : Dataset[dataset] : Background tree -- total number of entries: 6000 \n",
"--- DataSetFactory : Dataset[dataset] : Preselection: (will NOT affect number of requested training and testing events)\n",
"--- DataSetFactory : Dataset[dataset] : No preselection cuts applied on event classes\n",
"--- DataSetFactory : Dataset[dataset] : Weight renormalisation mode: \"NumEvents\": renormalises all event classes \n",
"--- DataSetFactory : Dataset[dataset] : such that the effective (weighted) number of events in each class equals the respective \n",
"--- DataSetFactory : Dataset[dataset] : number of events (entries) that you demanded in PrepareTrainingAndTestTree(\"\",\"nTrain_Signal=.. )\n",
"--- DataSetFactory : Dataset[dataset] : ... i.e. such that Sum[i=1..N_j]{w_i} = N_j, j=0,1,2...\n",
"--- DataSetFactory : Dataset[dataset] : ... (note that N_j is the sum of TRAINING events (nTrain_j...with j=Signal,Background..\n",
"--- DataSetFactory : Dataset[dataset] : ..... Testing events are not renormalised nor included in the renormalisation factor! )\n",
"--- DataSetFactory : Dataset[dataset] : --> Rescale Signal event weights by factor: 1\n",
"--- DataSetFactory : Dataset[dataset] : --> Rescale Background event weights by factor: 1\n",
"--- DataSetFactory : Dataset[dataset] : Number of training and testing events after rescaling:\n",
"--- DataSetFactory : Dataset[dataset] : ---------------------------------------------------------------------------\n",
"--- DataSetFactory : Dataset[dataset] : Signal -- training events : 1000 (sum of weights: 1000) - requested were 1000 events\n",
"--- DataSetFactory : Dataset[dataset] : Signal -- testing events : 5000 (sum of weights: 5000) - requested were 0 events\n",
"--- DataSetFactory : Dataset[dataset] : Signal -- training and testing events: 6000 (sum of weights: 6000)\n",
"--- DataSetFactory : Dataset[dataset] : Background -- training events : 1000 (sum of weights: 1000) - requested were 1000 events\n",
"--- DataSetFactory : Dataset[dataset] : Background -- testing events : 5000 (sum of weights: 5000) - requested were 0 events\n",
"--- DataSetFactory : Dataset[dataset] : Background -- training and testing events: 6000 (sum of weights: 6000)\n",
"--- DataSetFactory : Dataset[dataset] : Create internal training tree\n",
"--- DataSetFactory : Dataset[dataset] : Create internal testing tree\n",
"--- DataSetInfo : Dataset[dataset] : Correlation matrix (Signal):\n",
"--- DataSetInfo : ----------------------------------------\n",
"--- DataSetInfo : var1 var2 var3 var4\n",
"--- DataSetInfo : var1: +1.000 +0.386 +0.597 +0.808\n",
"--- DataSetInfo : var2: +0.386 +1.000 +0.696 +0.743\n",
"--- DataSetInfo : var3: +0.597 +0.696 +1.000 +0.860\n",
"--- DataSetInfo : var4: +0.808 +0.743 +0.860 +1.000\n",
"--- DataSetInfo : ----------------------------------------\n",
"--- DataSetInfo : Dataset[dataset] : Correlation matrix (Background):\n",
"--- DataSetInfo : ----------------------------------------\n",
"--- DataSetInfo : var1 var2 var3 var4\n",
"--- DataSetInfo : var1: +1.000 +0.856 +0.914 +0.964\n",
"--- DataSetInfo : var2: +0.856 +1.000 +0.927 +0.937\n",
"--- DataSetInfo : var3: +0.914 +0.927 +1.000 +0.971\n",
"--- DataSetInfo : var4: +0.964 +0.937 +0.971 +1.000\n",
"--- DataSetInfo : ----------------------------------------\n",
"--- DataSetFactory : Dataset[dataset] : \n",
"--- Factory : Booking method: \u001b[1mRMLP\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n",
"--- RMLP : Dataset[dataset] : Create Transformation \"N\" with events from all classes.\n",
"--- Norm : Transformation, Variable selection : \n",
"--- Norm : Input : variable 'var1' (index=0). <---> Output : variable 'var1' (index=0).\n",
"--- Norm : Input : variable 'var2' (index=1). <---> Output : variable 'var2' (index=1).\n",
"--- Norm : Input : variable 'var3' (index=2). <---> Output : variable 'var3' (index=2).\n",
"--- Norm : Input : variable 'var4' (index=3). <---> Output : variable 'var4' (index=3).\n",
"--- Factory : Booking method: \u001b[1mRXGB\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n",
"--- Factory : Booking method: \u001b[1mBDT\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n"
]
}
],
"source": [
"//C50 Boosted Decision Trees (BDTs)\n",
"factory.BookMethod(&loader, TMVA::Types::kC50, \"C50\",\n",
" \"!H:NTrials=5:Rules=kTRUE:ControlSubSet=kFALSE:ControlBands=10:ControlWinnow=kFALSE:ControlNoGlobalPruning=kTRUE:ControlCF=0.25:ControlMinCases=2:ControlFuzzyThreshold=kTRUE:ControlSample=0:ControlEarlyStopping=kTRUE:!V\" );\n",
" \n",
"//Neural Networks using RSNNS package\n",
"factory.BookMethod(&loader, TMVA::Types::kRSNNS, \"RMLP\",\n",
" \"!H:VarTransform=N:Size=c(5):Maxit=10:InitFunc=Randomize_Weights:LearnFunc=Std_Backpropagation:LearnFuncParams=c(0.2,0):!V\" );\n",
"\n",
"//eXtreme Gradient Boosted XGB Decision Trees\n",
"factory.BookMethod(&loader, TMVA::Types::kRXGB, \"RXGB\",\"!V:NRounds=20:MaxDepth=2:Eta=1\" );\n",
"\n",
"//TMVA BDTs\n",
"factory.BookMethod(&loader,TMVA::Types::kBDT, \"BDT\",\n",
" \"!V:NTrees=50:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20\" );"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the Methods"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Factory : \n",
"--- Factory : Train all methods for Classification ...\n",
"--- Factory : \n",
"--- Factory : current transformation string: 'I'\n",
"--- Factory : Dataset[dataset] : Create Transformation \"I\" with events from all classes.\n",
"--- Id : Transformation, Variable selection : \n",
"--- Id : Input : variable 'var1' (index=0). <---> Output : variable 'var1' (index=0).\n",
"--- Id : Input : variable 'var2' (index=1). <---> Output : variable 'var2' (index=1).\n",
"--- Id : Input : variable 'var3' (index=2). <---> Output : variable 'var3' (index=2).\n",
"--- Id : Input : variable 'var4' (index=3). <---> Output : variable 'var4' (index=3).\n",
"--- Id : Preparing the Identity transformation...\n",
"--- TFHandler_Factory : -----------------------------------------------------------\n",
"--- TFHandler_Factory : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_Factory : -----------------------------------------------------------\n",
"--- TFHandler_Factory : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n",
"--- TFHandler_Factory : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n",
"--- TFHandler_Factory : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n",
"--- TFHandler_Factory : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n",
"--- TFHandler_Factory : -----------------------------------------------------------\n",
"--- TFHandler_Factory : Plot event variables for Id\n",
"--- TFHandler_Factory : Create scatter and profile plots in target-file directory: \n",
"--- TFHandler_Factory : TMVAOutputCV.root:/dataset/InputVariables_Id/CorrelationPlots\n",
"--- TFHandler_Factory : \n",
"--- TFHandler_Factory : Ranking input variables (method unspecific)...\n",
"--- IdTransformation : Ranking result (top variable is best ranked)\n",
"--- IdTransformation : -----------------------------\n",
"--- IdTransformation : Rank : Variable : Separation\n",
"--- IdTransformation : -----------------------------\n",
"--- IdTransformation : 1 : var4 : 3.458e-01\n",
"--- IdTransformation : 2 : var3 : 2.817e-01\n",
"--- IdTransformation : 3 : var1 : 2.640e-01\n",
"--- IdTransformation : 4 : var2 : 2.173e-01\n",
"--- IdTransformation : -----------------------------\n",
"--- Factory : Train method: C50 for Classification\n",
"--- C50 : Dataset[dataset] : Begin training\n",
"--- C50 : \n",
"--- C50 : \u001b[1m--- Saving State File In:\u001b[0mweights/C50Model.RData\n",
"--- C50 : \n",
"--- C50 : Dataset[dataset] : End of training \n",
"--- C50 : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.873 sec\u001b[0m \n",
"--- C50 : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n",
"--- C50 : Dataset[dataset] : Evaluation of C50 on training sample (2000 events)\n",
"--- C50 : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.0748 sec\u001b[0m \n",
"--- C50 : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_C50.weights.xml\u001b[0m\n",
"--- Factory : Training finished\n",
"--- Factory : Train method: RMLP for Classification\n",
"--- Norm : Preparing the transformation.\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : var1: 0.085752 0.35735 [ -1.0000 1.0000 ]\n",
"--- TFHandler_RMLP : var2: 0.10321 0.36589 [ -1.0000 1.0000 ]\n",
"--- TFHandler_RMLP : var3: 0.10411 0.37276 [ -1.0000 1.0000 ]\n",
"--- TFHandler_RMLP : var4: 0.17623 0.40650 [ -1.0000 1.0000 ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- RMLP : Dataset[dataset] : Begin training\n",
"--- RMLP : \n",
"--- RMLP : \u001b[1m--- Saving State File In:\u001b[0mweights/RMLPModel.RData\n",
"--- RMLP : \n",
"--- RMLP : Dataset[dataset] : End of training \n",
"--- RMLP : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m1.07 sec\u001b[0m \n",
"--- RMLP : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n",
"--- RMLP : Dataset[dataset] : Evaluation of RMLP on training sample (2000 events)\n",
"--- RMLP : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.739 sec\u001b[0m \n",
"--- RMLP : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_RMLP.weights.xml\u001b[0m\n",
"--- RMLP : Dataset[dataset] : Creating standalone response class: \u001b[0;36mweights/TMVAClassification_RMLP.class.C\u001b[0m\n",
"--- Factory : Training finished\n",
"--- Factory : Train method: RXGB for Classification\n",
"--- RXGB : Dataset[dataset] : Begin training\n",
"[0]\ttrain-rmse:0.393636\n",
"[1]\ttrain-rmse:0.383795\n",
"[2]\ttrain-rmse:0.371803\n",
"[3]\ttrain-rmse:0.362196\n",
"[4]\ttrain-rmse:0.357333\n",
"[5]\ttrain-rmse:0.351344\n",
"[6]\ttrain-rmse:0.346084\n",
"[7]\ttrain-rmse:0.340683\n",
"[8]\ttrain-rmse:0.332887\n",
"[9]\ttrain-rmse:0.328419\n",
"[10]\ttrain-rmse:0.326714\n",
"[11]\ttrain-rmse:0.324244\n",
"[12]\ttrain-rmse:0.321295\n",
"[13]\ttrain-rmse:0.318570\n",
"[14]\ttrain-rmse:0.315405\n",
"[15]\ttrain-rmse:0.312603\n",
"[16]\ttrain-rmse:0.310186\n",
"[17]\ttrain-rmse:0.309424\n",
"[18]\ttrain-rmse:0.306099\n",
"[19]\ttrain-rmse:0.304363\n",
"--- RXGB : \n",
"--- RXGB : \u001b[1m--- Saving State File In:\u001b[0mweights/RXGBModel.RData\n",
"--- RXGB : \n",
"--- RXGB : Dataset[dataset] : End of training \n",
"--- RXGB : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.234 sec\u001b[0m \n",
"--- RXGB : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n",
"--- RXGB : Dataset[dataset] : Evaluation of RXGB on training sample (2000 events)\n",
"--- RXGB : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.00485 sec\u001b[0m \n",
"--- RXGB : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_RXGB.weights.xml\u001b[0m\n",
"--- Factory : Training finished\n",
"--- Factory : Train method: BDT for Classification\n",
"--- BDT : Dataset[dataset] : Begin training\n",
"--- BDT : found and suggest the following possible pre-selection cuts \n",
"--- BDT : as option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample\n",
"--- BDT : found cut: Bkg if var 0 < -2.99038\n",
"--- BDT : found cut: Bkg if var 2 < -2.88493\n",
"--- BDT : found cut: Bkg if var 3 < -2.54088\n",
"--- BDT : For classification trees, \n",
"--- BDT : the effective number of backgrounds is scaled to match \n",
"--- BDT : the signal. Othersise the first boosting step would do 'just that'!\n",
"--- BDT : re-normlise events such that Sig and Bkg have respective sum of weights = 1\n",
"--- BDT : sig->sig*1ev. bkg->bkg*1ev.\n",
"--- BDT : #events: (reweighted) sig: 1000 bkg: 1000\n",
"--- BDT : #events: (unweighted) sig: 1000 bkg: 1000\n",
"--- BDT : Training 50 Decision Trees ... patience please\n",
"--- BinaryTree : The minimal node size MinNodeSize=2.5 fMinNodeSize=2.5% is translated to an actual number of events = 25.7 for the training sample size of 1028\n",
"--- BinaryTree : Note: This number will be taken as absolute minimum in the node, \n",
"--- BinaryTree : in terms of 'weighted events' and unweighted ones !! \n",
"--- BDT : elapsed time: \u001b[1;31m0.0508 sec\u001b[0m \n",
"--- BDT : average number of nodes (w/o pruning) : 4\n",
"--- BDT : Dataset[dataset] : End of training \n",
"--- BDT : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.061 sec\u001b[0m \n",
"--- BDT : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n",
"--- BDT : Dataset[dataset] : Evaluation of BDT on training sample (2000 events)\n",
"--- BDT : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.00845 sec\u001b[0m \n",
"--- BDT : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mdataset/weights/TMVAClassification_BDT.weights.xml\u001b[0m\n",
"--- BDT : Dataset[dataset] : Creating standalone response class: \u001b[0;36mdataset/weights/TMVAClassification_BDT.class.C\u001b[0m\n",
"--- BDT : Write monitoring histograms to file: TMVAOutputCV.root:/dataset/Method_BDT/BDT\n",
"--- Factory : Training finished\n",
"--- Factory : \n",
"--- Factory : Ranking input variables (method specific)...\n",
"--- Factory : No variable ranking supplied by classifier: C50\n",
"--- Factory : No variable ranking supplied by classifier: RMLP\n",
"--- Factory : No variable ranking supplied by classifier: RXGB\n",
"--- BDT : Ranking result (top variable is best ranked)\n",
"--- BDT : --------------------------------------\n",
"--- BDT : Rank : Variable : Variable Importance\n",
"--- BDT : --------------------------------------\n",
"--- BDT : 1 : var4 : 4.608e-01\n",
"--- BDT : 2 : var1 : 2.923e-01\n",
"--- BDT : 3 : var2 : 1.457e-01\n",
"--- BDT : 4 : var3 : 1.012e-01\n",
"--- BDT : --------------------------------------\n",
"--- Factory : \n",
"--- Factory : === Destroy and recreate all methods via weight files for testing ===\n",
"--- Factory : \n",
"--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_C50.weights.xml\u001b[0m\n",
"--- C50 : Dataset[dataset] : Read method \"C50\" of type \"C50\"\n",
"--- C50 : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n",
"--- C50 : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n",
"--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_RMLP.weights.xml\u001b[0m\n",
"--- RMLP : Dataset[dataset] : Read method \"RMLP\" of type \"RSNNS\"\n",
"--- RMLP : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n",
"--- RMLP : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n",
"--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_RXGB.weights.xml\u001b[0m\n",
"--- RXGB : Dataset[dataset] : Read method \"RXGB\" of type \"RXGB\"\n",
"--- RXGB : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n",
"--- RXGB : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n",
"--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mdataset/weights/TMVAClassification_BDT.weights.xml\u001b[0m\n",
"--- BDT : Dataset[dataset] : Read method \"BDT\" of type \"BDT\"\n",
"--- BDT : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n",
"--- BDT : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n"
]
}
],
"source": [
"factory.TrainAllMethods();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing and Evaluating the data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- Factory : Test all methods...\n",
"--- Factory : Test method: C50 for Classification performance\n",
"--- C50 : Dataset[dataset] : Evaluation of C50 on testing sample (10000 events)\n",
"--- C50 : \n",
"--- C50 : \u001b[1m--- Loading State File From:\u001b[0mweights/C50Model.RData\n",
"--- C50 : \n",
"--- C50 : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.546 sec\u001b[0m \n",
"--- Factory : Test method: RMLP for Classification performance\n",
"--- RMLP : Dataset[dataset] : Evaluation of RMLP on testing sample (10000 events)\n",
"--- RMLP : \n",
"--- RMLP : \u001b[1m--- Loading State File From:\u001b[0mweights/RMLPModel.RData\n",
"--- RMLP : \n",
"--- RMLP : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m3.49 sec\u001b[0m \n",
"--- Factory : Test method: RXGB for Classification performance\n",
"--- RXGB : Dataset[dataset] : Evaluation of RXGB on testing sample (10000 events)\n",
"--- RXGB : \n",
"--- RXGB : \u001b[1m--- Loading State File From:\u001b[0mweights/RXGBModel.RData\n",
"--- RXGB : \n",
"--- RXGB : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.0286 sec\u001b[0m \n",
"--- Factory : Test method: BDT for Classification performance\n",
"--- BDT : Dataset[dataset] : Evaluation of BDT on testing sample (10000 events)\n",
"--- BDT : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.0235 sec\u001b[0m \n",
"--- Factory : Evaluate all methods...\n",
"--- Factory : Evaluate classifier: C50\n",
"--- C50 : Testing Classification C50 METHOD \n",
"--- C50 : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n",
"--- Factory : Write evaluation histograms to file\n",
"--- TFHandler_C50 : Plot event variables for C50\n",
"--- TFHandler_C50 : -----------------------------------------------------------\n",
"--- TFHandler_C50 : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_C50 : -----------------------------------------------------------\n",
"--- TFHandler_C50 : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n",
"--- TFHandler_C50 : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n",
"--- TFHandler_C50 : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n",
"--- TFHandler_C50 : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n",
"--- TFHandler_C50 : -----------------------------------------------------------\n",
"--- TFHandler_C50 : Create scatter and profile plots in target-file directory: \n",
"--- TFHandler_C50 : TMVAOutputCV.root:/dataset/Method_C50/C50/CorrelationPlots\n",
"--- Factory : Evaluate classifier: RMLP\n",
"--- RMLP : Testing Classification RMLP METHOD \n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : var1: 0.066774 0.35913 [ -1.2024 1.0914 ]\n",
"--- TFHandler_RMLP : var2: 0.079492 0.36669 [ -1.1391 1.2044 ]\n",
"--- TFHandler_RMLP : var3: 0.079125 0.37282 [ -1.0685 1.0783 ]\n",
"--- TFHandler_RMLP : var4: 0.15120 0.40805 [ -1.1921 1.0737 ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- RMLP : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n",
"--- Factory : Write evaluation histograms to file\n",
"--- TFHandler_RMLP : Plot event variables for RMLP\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : var1: 0.066774 0.35913 [ -1.2024 1.0914 ]\n",
"--- TFHandler_RMLP : var2: 0.079492 0.36669 [ -1.1391 1.2044 ]\n",
"--- TFHandler_RMLP : var3: 0.079125 0.37282 [ -1.0685 1.0783 ]\n",
"--- TFHandler_RMLP : var4: 0.15120 0.40805 [ -1.1921 1.0737 ]\n",
"--- TFHandler_RMLP : -----------------------------------------------------------\n",
"--- TFHandler_RMLP : Create scatter and profile plots in target-file directory: \n",
"--- TFHandler_RMLP : TMVAOutputCV.root:/dataset/Method_RMLP/RMLP/CorrelationPlots\n",
"--- Factory : Evaluate classifier: RXGB\n",
"--- RXGB : Testing Classification RXGB METHOD \n",
"--- RXGB : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n",
"--- Factory : Write evaluation histograms to file\n",
"--- TFHandler_RXGB : Plot event variables for RXGB\n",
"--- TFHandler_RXGB : -----------------------------------------------------------\n",
"--- TFHandler_RXGB : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_RXGB : -----------------------------------------------------------\n",
"--- TFHandler_RXGB : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n",
"--- TFHandler_RXGB : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n",
"--- TFHandler_RXGB : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n",
"--- TFHandler_RXGB : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n",
"--- TFHandler_RXGB : -----------------------------------------------------------\n",
"--- TFHandler_RXGB : Create scatter and profile plots in target-file directory: \n",
"--- TFHandler_RXGB : TMVAOutputCV.root:/dataset/Method_RXGB/RXGB/CorrelationPlots\n",
"--- Factory : Evaluate classifier: BDT\n",
"--- BDT : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n",
"--- Factory : Write evaluation histograms to file\n",
"--- TFHandler_BDT : Plot event variables for BDT\n",
"--- TFHandler_BDT : -----------------------------------------------------------\n",
"--- TFHandler_BDT : Variable Mean RMS [ Min Max ]\n",
"--- TFHandler_BDT : -----------------------------------------------------------\n",
"--- TFHandler_BDT : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n",
"--- TFHandler_BDT : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n",
"--- TFHandler_BDT : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n",
"--- TFHandler_BDT : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n",
"--- TFHandler_BDT : -----------------------------------------------------------\n",
"--- TFHandler_BDT : Create scatter and profile plots in target-file directory: \n",
"--- TFHandler_BDT : TMVAOutputCV.root:/dataset/Method_BDT/BDT/CorrelationPlots\n",
"--- Factory : \n",
"--- Factory : Evaluation results ranked by best signal efficiency and purity (area)\n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : DataSet MVA Signal efficiency at bkg eff.(error): | Sepa- Signifi- \n",
"--- Factory : Name: Method: @B=0.01 @B=0.10 @B=0.30 ROC-integ ROCCurve| ration: cance: \n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : dataset RMLP : 0.343(06) 0.764(06) 0.954(02) 0.930 0.929 | 0.578 1.647\n",
"--- Factory : dataset RXGB : 0.208(05) 0.693(06) 0.921(03) 0.901 0.902 | 0.507 1.387\n",
"--- Factory : dataset BDT : 0.263(06) 0.642(06) 0.900(04) 0.894 0.894 | 0.483 1.248\n",
"--- Factory : dataset C50 : 0.000(00) 0.689(06) 0.926(03) 0.892 0.898 | 0.521 1.411\n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : \n",
"--- Factory : Testing efficiency compared to training efficiency (overtraining check)\n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : DataSet MVA Signal efficiency: from test sample (from training sample) \n",
"--- Factory : Name: Method: @B=0.01 @B=0.10 @B=0.30 \n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : dataset RMLP : 0.343 (0.365) 0.764 (0.784) 0.954 (0.955)\n",
"--- Factory : dataset RXGB : 0.208 (0.375) 0.693 (0.802) 0.921 (0.944)\n",
"--- Factory : dataset BDT : 0.263 (0.225) 0.642 (0.671) 0.900 (0.902)\n",
"--- Factory : dataset C50 : 0.000 (0.474) 0.689 (0.848) 0.926 (0.943)\n",
"--- Factory : -------------------------------------------------------------------------------------------------------------------\n",
"--- Factory : \n",
"--- Dataset:dataset : Dataset[dataset] : Created tree 'TestTree' with 10000 events\n",
"--- Dataset:dataset : Dataset[dataset] : Created tree 'TrainTree' with 2000 events\n",
"--- Factory : \n",
"--- Factory : \u001b[1mThank you for using TMVA!\u001b[0m\n",
"--- Factory : \u001b[1mFor citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b[0m\n"
]
}
],
"source": [
"factory.TestAllMethods();\n",
"factory.EvaluateAllMethods(); "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ploting ROC Curve\n",
"We enable the ROOT JavaScript interactive visualisation."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%jsroot on\n",
"auto c = factory.GetROCCurve(&loader);\n",
"c->Draw();"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ROOT C++",
"language": "c++",
"name": "root"
},
"language_info": {
"codemirror_mode": "text/x-c++src",
"file_extension": ".C",
"mimetype": " text/x-c++src",
"name": "c++"
}
},
"nbformat": 4,
"nbformat_minor": 1
}