{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# User interface\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "\n", "require(['notebook'],\n", " function() {\n", " IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-c++src'] = {'reg':[/^%%cpp/]};\n", " console.log(\"JupyROOT - %%cpp magic configured\");\n", " }\n", ");\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Welcome to JupyROOT 6.07/07\n" ] } ], "source": [ "import ROOT\n", "from ROOT import TFile, TMVA, TCut" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enable JS visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use new interactive features in notebook we have to enable a module called JsMVA. This can be done by using ipython magic: %jsmva." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%jsmva on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Declaration of Factory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's start with the classical version of declaration. If you know how to use TMVA in C++ then you can use that version here in python: first we need to pass a string called job name, as second argument we need to pass an opened output TFile (this is optional, if it's present then it will be used to store output histograms) and as third (or second) argument we pass a string which contains all the settings related to Factory (separated with ':' character)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### C++ like declaration" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "SysError in : could not delete TMVA.root (errno: 26) (Text file busy)\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "outputFile = TFile( \"TMVA.root\", 'RECREATE' )\n", "TMVA.Tools.Instance();\n", "\n", "factory = TMVA.Factory( \"TMVAClassification\", outputFile #this is optional\n", " ,\"!V:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
OptionDefaultPredefined valuesDescription
VFalse-Verbose flag
ColorTrue-Flag for colored output
Transformations\"\"-List of transformations to test. For example with \"I;D;P;U;G\" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied
SilentFalse-Batch mode: boolean silent flag inhibiting\n", "any output from TMVA after\n", "the creation of the factory class object
DrawProgressBarTrue-Draw progress bar to display training,\n", "testing and evaluation schedule (default:\n", "True)
AnalysisTypeAutoClassification,\n", "Regression,\n", "Multiclass, AutoSet the analysis type
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By enabling JsMVA we have new, more readable ways to do the declaration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### First version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of passing options as a long string we can pass them separately as named arguments:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "factory = TMVA.Factory(\"TMVAClassification\", outputFile,\n", " V=False, Color=True,Silent=True, DrawProgressBar=True, Transformations=\"I;D;P;G,D\", AnalysisType=\"Classification\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see the Transformations variable is set to \"I;D;P;G;D\" string. Instead of this, we can pass these options as a list: [\"I\", \"D\", \"P\", \"G\", \"D\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Second version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first version we just changed the way as we pass the options. The first 2 argument was still positional arguments. These parameters also can be passed as named arguments: the name of first parameter in first version is JobName and the name of second argument is TargetFile" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "factory = TMVA.Factory(JobName=\"TMVAClassification\", TargetFile=outputFile,\n", " V=False, Color=True, DrawProgressBar=True, Transformations=[\"I\", \"D\", \"P\", \"G\", \"D\"],\n", " AnalysisType=\"Classification\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arguments of constructor:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
JobNameyes, 1.not optional-Name of job
TargetFileyes, 2.if not passed histograms won't be saved-File to write control and performance histograms histograms
VnoFalse-Verbose flag
ColornoTrue-Flag for colored output
Transformationsno\"\"-List of transformations to test. For example with \"I;D;P;U;G\" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied
SilentnoFalse-Batch mode: boolean silent flag inhibiting\n", "any output from TMVA after\n", "the creation of the factory class object
DrawProgressBarnoTrue-Draw progress bar to display training,\n", "testing and evaluation schedule (default:\n", "True)
AnalysisTypenoAutoClassification,\n", "Regression,\n", "Multiclass, AutoSet the analysis type
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Declaring the DataLoader, adding variables and setting up the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we need to declare a DataLoader and add the variables (passing the variable names used in the test and train trees in input dataset). To add variable names to DataLoader we use the AddVariable function. Arguments of this function:\n", "\n", "1. String containing the variable name. Using \":=\" we can add definition too.\n", "\n", "2. String (label to variable, if not present the variable name will be used) or character (defining the type of data points)\n", "\n", "3. If we have label for variable, the data point type still can be passed as third argument " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dataset = \"tmva_class_example\" #the dataset name\n", "loader = TMVA.DataLoader(dataset)\n", "\n", "loader.AddVariable( \"myvar1 := var1+var2\", 'F' )\n", "loader.AddVariable( \"myvar2 := var1-var2\", \"Expression 2\", 'F' )\n", "loader.AddVariable( \"var3\", \"Variable 3\", 'F' )\n", "loader.AddVariable( \"var4\", \"Variable 4\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to define spectator variables, which are part of the input data set, but which are not\n", "used in the MVA training, test nor during the evaluation, but can be used for correlation tests or others. \n", "Parameters:\n", "\n", "1. String containing the definition of spectator variable.\n", "2. Label for spectator variable.\n", "3. Data type" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "loader.AddSpectator( \"spec1:=var1*2\", \"Spectator 1\", 'F' )\n", "loader.AddSpectator( \"spec2:=var1*3\", \"Spectator 2\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After adding the variables we have to add the datas to DataLoader. In order to do this we check if the dataset file doesn't exist in files directory we download from CERN's server. When we have the root file we open it and get the signal and background trees." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "if ROOT.gSystem.AccessPathName( \"tmva_class_example.root\" ) != 0: \n", " ROOT.gSystem.Exec( \"wget https://root.cern.ch/files/tmva_class_example.root\")\n", " \n", "input = TFile.Open( \"tmva_class_example.root\" )\n", "\n", "# Get the signal and background trees for training\n", "signal = input.Get( \"TreeS\" )\n", "background = input.Get( \"TreeB\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To pass the signal and background trees to DataLoader we use the AddSignalTree and AddBackgroundTree functions, and we set up the corresponding DataLoader variable's too.\n", "Arguments of functions:\n", "\n", "1. Signal/Background tree\n", "2. Global weight used in all events in the tree." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Global event weights (see below for setting event-wise weights)\n", "signalWeight = 1.0\n", "backgroundWeight = 1.0\n", "\n", "loader.AddSignalTree(signal, signalWeight)\n", "loader.AddBackgroundTree(background, backgroundWeight)\n", "\n", "loader.fSignalWeight = signalWeight\n", "loader.fBackgroundWeight = backgroundWeight\n", "loader.fTreeS = signal\n", "loader.fTreeB = background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With using DataLoader.PrepareTrainingAndTestTree function we apply cuts on input events. In C++ this function also needs to add the options as a string (as we seen in Factory constructor) which with JsMVA can be passed (same as Factory constructor case) as keyword arguments.\n", "\n", "Arguments of PrepareTrainingAndTestTree:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
SigCutyes, 1.--TCut object for signal cut
Bkgyes, 2.--TCut object for background cut
SplitModenoRandomRandom,\n", "Alternate,\n", "BlockMethod of picking training and testing\n", "events
MixModenoSameAsSplitModeSameAsSplitMode,\n", "Random,\n", "Alternate,\n", "BlockMethod of mixing events of differnt\n", "classes into one dataset
SplitSeedno100-Seed for random event shuffling
NormModenoEqualNumEventsNone, NumEvents,\n", "EqualNumEventsOverall renormalisation of event-by-event\n", "weights used in the training (NumEvents:\n", "average weight of 1 per\n", "event, independently for signal and\n", "background; EqualNumEvents: average\n", "weight of 1 per event for signal,\n", "and sum of weights for background\n", "equal to sum of weights for signal)
nTrain_Signalno0 (all)-Number of training events of class Signal
nTest_Signalno0 (all)-Number of test events of class Signal
nTrain_Backgroundno0 (all)-Number of training events of class\n", "Background
nTest_Background no0 (all)-Number of test events of class Background
VnoFalse-Verbosity
VerboseLevelnoInfoDebug, Verbose,\n", "InfoVerbosity level
" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mycuts = TCut(\"\")\n", "mycutb = TCut(\"\")\n", "\n", "loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,\n", " nTrain_Signal=0, nTrain_Background=0, SplitMode=\"Random\", NormMode=\"NumEvents\", V=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Booking methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add which we want to train on dataset we have to use the Factory.BookMethod function. This method will add a method and it's options to Factory.\n", "\n", "Arguments:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
DataLoaderyes, 1.--Pointer to DataLoader object
Methodyes, 2.- kVariable\n", " kCuts ,\n", " kLikelihood ,\n", " kPDERS ,\n", " kHMatrix ,\n", " kFisher ,\n", " kKNN ,\n", " kCFMlpANN ,\n", " kTMlpANN ,\n", " kBDT ,\n", " kDT ,\n", " kRuleFit ,\n", " kSVM ,\n", " kMLP ,\n", " kBayesClassifier,\n", " kFDA ,\n", " kBoost ,\n", " kPDEFoam ,\n", " kLD ,\n", " kPlugins ,\n", " kCategory ,\n", " kDNN ,\n", " kPyRandomForest ,\n", " kPyAdaBoost ,\n", " kPyGTB ,\n", " kC50 ,\n", " kRSNNS ,\n", " kRSVM ,\n", " kRXGB ,\n", " kMaxMethodSelected method number, method numbers defined in TMVA.Types
MethodTitleyes, 3.--Label for method
* no -- Other named arguments which are the options for selected method.
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kMLP, MethodTitle=\"MLP\", \n", " H=False, V=False, NeuronType=\"tanh\", VarTransform=\"N\", NCycles=600, HiddenLayers=\"N+5\",\n", " TestRate=5, UseRegulator=False )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate importance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To calculate variable importance we can use Factory.EvaluateImportance function. The parameters of this function are the following:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
DataLoaderyes, 1.--Pointer to DataLoader object
VITypeyes, 2.--Variable Importance type
Methodyes, 3.- kVariable\n", " kCuts ,\n", " kLikelihood ,\n", " kPDERS ,\n", " kHMatrix ,\n", " kFisher ,\n", " kKNN ,\n", " kCFMlpANN ,\n", " kTMlpANN ,\n", " kBDT ,\n", " kDT ,\n", " kRuleFit ,\n", " kSVM ,\n", " kMLP ,\n", " kBayesClassifier,\n", " kFDA ,\n", " kBoost ,\n", " kPDEFoam ,\n", " kLD ,\n", " kPlugins ,\n", " kCategory ,\n", " kDNN ,\n", " kPyRandomForest ,\n", " kPyAdaBoost ,\n", " kPyGTB ,\n", " kC50 ,\n", " kRSNNS ,\n", " kRSVM ,\n", " kRXGB ,\n", " kMaxMethodSelected method number, method numbers defined in TMVA.Types
MethodTitleyes, 4.--Label for method
VnoFalse-Verbose
NTreesnoNTrees
MinNodeSizenoMinNodeSize
MaxDepthnoMaxDepth
BoostTypenoBoostType
AdaBoostBetanoAdaBoostBeta
UseBaggedBoostnoUseBaggedBoost
BaggedSampleFractionno
SeparationTypeno
nCutsnonCuts
" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
00000000000000000000000000001111 BDT : 0.830
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
00000000000000000000000000001111 BDT : 0.000 (0.000) 0.000 (0.000) 0.866 (0.871)
Factory Evaluate classifier: MLP
TFHandler_MLP
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
MLP
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
tmva_class_example MLP : 0.939
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
tmva_class_example MLP : 0.382 (0.349) 0.802 (0.794) 0.964 (0.966)
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
DataSetInfo
Dataset: 00000000000000000000000000001110Added class \"Signal\"
Add Tree TreeS of type Signal with 6000 events
DataSetInfo
Dataset: 00000000000000000000000000001110Added class \"Background\"
Add Tree TreeB of type Background with 6000 events
Factory Booking method: BDT\u001b
DataSetFactory
Dataset: 00000000000000000000000000001110Number of events in input trees
Weight renormalisation mode: \"EqualNumEvents\": renormalises all event classes ...
such that the effective (weighted) number of events in each class is the same
(and equals the number of events (entries) given for class=0 )
... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
... (note that N_j is the sum of TRAINING events
..... Testing events are not renormalised nor included in the renormalisation factor!)
Number of training and testing events
Signaltraining events3000
testing events3000
training and testing events6000
Backgroundtraining events3000
testing events3000
training and testing events6000
DataSetInfo Correlation matrix (Signal)
DataSetInfo Correlation matrix (Background)
DataSetFactory
Dataset: 00000000000000000000000000001110
Factory Train method: BDT for Classification
BDT #events: (reweighted) sig: 3000 bkg: 3000
#events: (unweighted) sig: 3000 bkg: 3000
Training 5 Decision Trees ... patience please
Elapsed time for training with 6000 events : 0.018 sec\u001b
BDT
Dataset: 00000000000000000000000000001110Evaluation of BDT on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00555 sec\u001b
Factory Training finished
Ranking input variables (method specific)...
BDT Ranking result (top variable is best ranked)
Rank : Variable : Variable Importance
1 : var4 : 7.847e-01
2 : var3 : 2.153e-01
3 : var1-var2 : 0.000e+00
Factory Test method: BDT for Classification performance
BDT
Dataset: 00000000000000000000000000001110Evaluation of BDT on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00515 sec\u001b
Factory Evaluate classifier: BDT
BDT
Dataset: 00000000000000000000000000001110Loop over test events and fill histograms with classifier response...
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
00000000000000000000000000001110 BDT : 0.790
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
00000000000000000000000000001110 BDT : 0.000 (0.000) 0.000 (0.000) 0.769 (0.786)
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
DataSetInfo
Dataset: 00000000000000000000000000001101Added class \"Signal\"
Add Tree TreeS of type Signal with 6000 events
DataSetInfo
Dataset: 00000000000000000000000000001101Added class \"Background\"
Add Tree TreeB of type Background with 6000 events
Factory Booking method: BDT\u001b
DataSetFactory
Dataset: 00000000000000000000000000001101Number of events in input trees
Weight renormalisation mode: \"EqualNumEvents\": renormalises all event classes ...
such that the effective (weighted) number of events in each class is the same
(and equals the number of events (entries) given for class=0 )
... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
... (note that N_j is the sum of TRAINING events
..... Testing events are not renormalised nor included in the renormalisation factor!)
Number of training and testing events
Signaltraining events3000
testing events3000
training and testing events6000
Backgroundtraining events3000
testing events3000
training and testing events6000
DataSetInfo Correlation matrix (Signal)
DataSetInfo Correlation matrix (Background)
DataSetFactory
Dataset: 00000000000000000000000000001101
Factory Train method: BDT for Classification
BDT #events: (reweighted) sig: 3000 bkg: 3000
#events: (unweighted) sig: 3000 bkg: 3000
Training 5 Decision Trees ... patience please
Elapsed time for training with 6000 events : 0.0175 sec\u001b
BDT
Dataset: 00000000000000000000000000001101Evaluation of BDT on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00503 sec\u001b
Factory Training finished
Ranking input variables (method specific)...
BDT Ranking result (top variable is best ranked)
Rank : Variable : Variable Importance
1 : var4 : 6.324e-01
2 : var1+var2 : 3.662e-01
3 : var3 : 1.458e-03
Factory Test method: BDT for Classification performance
BDT
Dataset: 00000000000000000000000000001101Evaluation of BDT on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00549 sec\u001b
Factory Evaluate classifier: BDT
BDT
Dataset: 00000000000000000000000000001101Loop over test events and fill histograms with classifier response...
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
00000000000000000000000000001101 BDT : 0.830
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
00000000000000000000000000001101 BDT : 0.000 (0.000) 0.000 (0.000) 0.866 (0.871)
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
DataSetInfo
Dataset: 00000000000000000000000000001011Added class \"Signal\"
Add Tree TreeS of type Signal with 6000 events
DataSetInfo
Dataset: 00000000000000000000000000001011Added class \"Background\"
Add Tree TreeB of type Background with 6000 events
Factory Booking method: BDT\u001b
DataSetFactory
Dataset: 00000000000000000000000000001011Number of events in input trees
Weight renormalisation mode: \"EqualNumEvents\": renormalises all event classes ...
such that the effective (weighted) number of events in each class is the same
(and equals the number of events (entries) given for class=0 )
... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
... (note that N_j is the sum of TRAINING events
..... Testing events are not renormalised nor included in the renormalisation factor!)
Number of training and testing events
Signaltraining events3000
testing events3000
training and testing events6000
Backgroundtraining events3000
testing events3000
training and testing events6000
DataSetInfo Correlation matrix (Signal)
DataSetInfo Correlation matrix (Background)
DataSetFactory
Dataset: 00000000000000000000000000001011
Factory Train method: BDT for Classification
BDT #events: (reweighted) sig: 3000 bkg: 3000
#events: (unweighted) sig: 3000 bkg: 3000
Training 5 Decision Trees ... patience please
Elapsed time for training with 6000 events : 0.0191 sec\u001b
BDT
Dataset: 00000000000000000000000000001011Evaluation of BDT on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00532 sec\u001b
Factory Training finished
Ranking input variables (method specific)...
BDT Ranking result (top variable is best ranked)
Rank : Variable : Variable Importance
1 : var4 : 6.333e-01
2 : var1+var2 : 3.667e-01
3 : var1-var2 : 0.000e+00
Factory Test method: BDT for Classification performance
BDT
Dataset: 00000000000000000000000000001011Evaluation of BDT on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00483 sec\u001b
Factory Evaluate classifier: BDT
BDT
Dataset: 00000000000000000000000000001011Loop over test events and fill histograms with classifier response...
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
00000000000000000000000000001011 BDT : 0.830
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
00000000000000000000000000001011 BDT : 0.000 (0.000) 0.000 (0.000) 0.866 (0.871)
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
DataSetInfo
Dataset: 00000000000000000000000000000111Added class \"Signal\"
Add Tree TreeS of type Signal with 6000 events
DataSetInfo
Dataset: 00000000000000000000000000000111Added class \"Background\"
Add Tree TreeB of type Background with 6000 events
Factory Booking method: BDT\u001b
DataSetFactory
Dataset: 00000000000000000000000000000111Number of events in input trees
Weight renormalisation mode: \"EqualNumEvents\": renormalises all event classes ...
such that the effective (weighted) number of events in each class is the same
(and equals the number of events (entries) given for class=0 )
... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
... (note that N_j is the sum of TRAINING events
..... Testing events are not renormalised nor included in the renormalisation factor!)
Number of training and testing events
Signaltraining events3000
testing events3000
training and testing events6000
Backgroundtraining events3000
testing events3000
training and testing events6000
DataSetInfo Correlation matrix (Signal)
DataSetInfo Correlation matrix (Background)
DataSetFactory
Dataset: 00000000000000000000000000000111
Factory Train method: BDT for Classification
BDT #events: (reweighted) sig: 3000 bkg: 3000
#events: (unweighted) sig: 3000 bkg: 3000
Training 5 Decision Trees ... patience please
Elapsed time for training with 6000 events : 0.0205 sec\u001b
BDT
Dataset: 00000000000000000000000000000111Evaluation of BDT on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00517 sec\u001b
Factory Training finished
Ranking input variables (method specific)...
BDT Ranking result (top variable is best ranked)
Rank : Variable : Variable Importance
1 : var1+var2 : 6.781e-01
2 : var3 : 3.219e-01
3 : var1-var2 : 0.000e+00
Factory Test method: BDT for Classification performance
BDT
Dataset: 00000000000000000000000000000111Evaluation of BDT on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00503 sec\u001b
Factory Evaluate classifier: BDT
BDT
Dataset: 00000000000000000000000000000111Loop over test events and fill histograms with classifier response...
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
00000000000000000000000000000111 BDT : 0.780
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
00000000000000000000000000000111 BDT : 0.000 (0.000) 0.000 (0.000) 0.000 (0.000)
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
--- Variable Importance Results (Short)
--- var1+var2 = 43.9596 %
--- var1-var2 = 0 %
--- var3 = 0 %
--- var4 = 56.0404 %
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.EvaluateImportance(DataLoader=loader,VIType=0, Method=TMVA.Types.kBDT, MethodTitle=\"BDT\",\n", " V=False,NTrees=5, MinNodeSize=\"2.5%\",MaxDepth=2, BoostType=\"AdaBoost\", AdaBoostBeta=0.5, \n", " UseBaggedBoost=True, BaggedSampleFraction=0.5, SeparationType=\"GiniIndex\", nCuts=20 );" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }