<div>
    <div style="float:left;">
        <img src="http://oproject.org/tiki-download_file.php?fileId=8&display&x=450&y=128" width="50%" />
    </div>
    <div style="float:left;">
        <img src="http://gfif.udea.edu.co/root/tmva/img/tmva_logo.gif" width="50%"/>
    </div>
</div>

# User interface
<hr style="border-top-width: 4px; border-top-color: #34609b;">

<!--<script src="JsRoot/scripts/JSRootCore.js?jq2d&onload=JsRootLoadedCall" type="text/javascript"></script>-->

In [1]:
import ROOT
from ROOT import TFile, TMVA, TCut

Welcome to JupyROOT 6.07/07


## Enable JS visualization

To use new interactive features in notebook we have to enable a module called JsMVA. This can be done by using ipython magic: %jsmva.

In [2]:
%jsmva on

## Declaration of Factory

First let's start with the classical version of declaration. If you know how to use TMVA in C++ then you can use that version here in python: first we need to pass a string called job name, as second argument we need to pass an opened output TFile (this is optional, if it's present then it will be used to store output histograms) and as third (or second) argument we pass a string which contains all the settings related to Factory (separated with ':' character).

### C++ like declaration

In [3]:
outputFile = TFile( "TMVA.root", 'RECREATE' )
TMVA.Tools.Instance();

factory = TMVA.Factory( "TMVAClassification", outputFile #this is optional
                       ,"!V:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification" )

The options string can contain the following options:
<table>
<tr><th>Option</th><th>Default</th><th>Predefined values</th><th>Description</th></tr>
<tr>
 <td>V</td>
 <td>False</td>
 <td>-</td>
 <td>Verbose flag</td>
</tr>
<tr>
 <td>Color</td>
 <td>True</td>
 <td>-</td>
 <td>Flag for colored output</td>
</tr>
<tr>
 <td>Transformations</td>
 <td>""</td>
 <td>-</td>
 <td>List of transformations to test. For example with "I;D;P;U;G" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied</td>
</tr>
<tr>
 <td>Silent</td>
 <td>False</td>
 <td>-</td>
 <td>Batch mode: boolean silent flag inhibiting
any output from TMVA after
the creation of the factory class object</td>
</tr>
<tr>
 <td>DrawProgressBar</td>
 <td>True</td>
 <td>-</td>
 <td>Draw progress bar to display training,
testing and evaluation schedule (default:
True)</td>
</tr>
<tr>
 <td>AnalysisType</td>
 <td>Auto</td>
 <td>Classification,
Regression,
Multiclass, Auto</td>
 <td>Set the analysis type</td>
</tr>
</table>

### Pythonic version

By enabling JsMVA we have new, more readable ways to do the declaration.

#### First version

Instead of passing options as a long string we can pass them separately as named arguments:

In [4]:
factory = TMVA.Factory("TMVAClassification", outputFile,
            V=False, Color=True,Silent=True, DrawProgressBar=True, Transformations="I;D;P;G,D", AnalysisType="Classification")

You can see the Transformations variable is set to "I;D;P;G;D" string. Instead of this, we can pass these options as a list: ["I", "D", "P", "G", "D"]

#### Second version

In the first version we just changed the way as we pass the options. The first 2 argument was still positional arguments. These parameters also can be passed as named arguments: the name of first parameter in first version is <b>JobName</b> and the name of second argument is <b>TargetFile</b>

In [5]:
factory = TMVA.Factory(JobName="TMVAClassification", TargetFile=outputFile,
                       V=False, Color=True, DrawProgressBar=True, Transformations=["I", "D", "P", "G", "D"],
                       AnalysisType="Classification")

Arguments of constructor:
The options string can contain the following options:
<table>
<tr><th>Keyword</th><th>Can be used as positional argument</th><th>Default</th><th>Predefined values</th><th>Description</th></tr>
<tr>
 <td>JobName</td>
 <td>yes, 1.</td>
 <td>not optional</td>
 <td>-</td>
 <td>Name of job</td>
</tr>
<tr>
 <td>TargetFile</td>
 <td>yes, 2.</td>
 <td>if not passed histograms won't be saved</td>
 <td>-</td>
 <td>File to write control and performance histograms histograms </td>
</tr>
<tr>
 <td>V</td>
 <td>no</td>
 <td>False</td>
 <td>-</td>
 <td>Verbose flag</td>
</tr>
<tr>
 <td>Color</td>
  <td>no</td>

 <td>True</td>
 <td>-</td>
 <td>Flag for colored output</td>
</tr>
<tr>
 <td>Transformations</td>
  <td>no</td>

 <td>""</td>
 <td>-</td>
 <td>List of transformations to test. For example with "I;D;P;U;G" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied</td>
</tr>
<tr>
 <td>Silent</td>
  <td>no</td>

 <td>False</td>

 <td>-</td>
 <td>Batch mode: boolean silent flag inhibiting
any output from TMVA after
the creation of the factory class object</td>
</tr>
<tr>
 <td>DrawProgressBar</td>
  <td>no</td>

 <td>True</td>
 <td>-</td>
 <td>Draw progress bar to display training,
testing and evaluation schedule (default:
True)</td>
</tr>
<tr>
 <td>AnalysisType</td>
  <td>no</td>

 <td>Auto</td>
 <td>Classification,
Regression,
Multiclass, Auto</td>
 <td>Set the analysis type</td>
</tr>
</table>

## Declaring the DataLoader, adding variables and setting up the dataset

First we need to declare a DataLoader and add the variables (passing the variable names used in the test and train trees in input dataset). To add variable names to DataLoader we use the AddVariable function. Arguments of this function:

1. String containing the variable name. Using ":=" we can add definition too.

2. String (label to variable, if not present the variable name will be used) or character (defining the type of data points)

3. If we have label for variable, the data point type still can be passed as third argument 

In [6]:
dataset = "tmva_class_example" #the dataset name
loader  = TMVA.DataLoader(dataset)

loader.AddVariable( "myvar1 := var1+var2", 'F' )
loader.AddVariable( "myvar2 := var1-var2", "Expression 2", 'F' )
loader.AddVariable( "var3",                "Variable 3", 'F' )
loader.AddVariable( "var4",                "Variable 4", 'F' )

It is possible to define spectator variables, which are part of the input data set, but which are not
used in the MVA training, test nor during the evaluation, but can be used for correlation tests or others. 
Parameters:

1. String containing the definition of spectator variable.
2. Label for spectator variable.
3. Data type

In [7]:
loader.AddSpectator( "spec1:=var1*2",  "Spectator 1",  'F' )
loader.AddSpectator( "spec2:=var1*3",  "Spectator 2",  'F' )

After adding the variables we have to add the datas to DataLoader. In order to do this we check if the dataset file doesn't exist in files directory we download from CERN's server. When we have the root file we open it and get the signal and background trees.

In [8]:
if ROOT.gSystem.AccessPathName( "tmva_class_example.root" ) != 0: 
    ROOT.gSystem.Exec( "wget https://root.cern.ch/files/tmva_class_example.root")
    
input = TFile.Open( "tmva_class_example.root" )

# Get the signal and background trees for training
signal      = input.Get( "TreeS" )
background  = input.Get( "TreeB" )

To pass the signal and background trees to DataLoader we use the AddSignalTree and AddBackgroundTree functions, and we set up the corresponding DataLoader variable's too.
Arguments of functions:

1. Signal/Background tree
2. Global weight used in all events in the tree.

In [9]:
# Global event weights (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0

loader.AddSignalTree(signal, signalWeight)
loader.AddBackgroundTree(background, backgroundWeight)

loader.fSignalWeight = signalWeight
loader.fBackgroundWeight = backgroundWeight
loader.fTreeS = signal
loader.fTreeB = background

With using DataLoader.PrepareTrainingAndTestTree function we apply cuts on input events. In C++ this function also needs to add the options as a string (as we seen in Factory constructor) which with JsMVA can be passed (same as Factory constructor case) as keyword arguments.

Arguments of PrepareTrainingAndTestTree:
<table>

<tr>
    <th>Keyword</th>
    <th>Can be used as positional argument</th>
    <th>Default</th>
    <th>Predefined values</th>
    <th>Description</th>
</tr>

<tr>
    <td>SigCut</td>
    <td>yes, 1.</td>
    <td>-</td>
    <td>-</td>
    <td>TCut object for signal cut</td>
</tr>
<tr>
    <td>Bkg</td>
    <td>yes, 2.</td>
    <td>-</td>
    <td>-</td>
    <td>TCut object for background cut</td>
</tr>

<tr>
    <td>SplitMode</td>
    <td>no</td>
    <td>Random</td>
    <td>Random,
Alternate,
Block</td>
    <td>Method of picking training and testing
events</td>
</tr>
<tr>
    <td>MixMode</td>
    <td>no</td>
    <td>SameAsSplitMode</td>
    <td>SameAsSplitMode,
Random,
Alternate,
Block</td>
    <td>Method of mixing events of differnt
classes into one dataset</td>
</tr>
<tr>
    <td>SplitSeed</td>
    <td>no</td>
    <td>100</td>
    <td>-</td>
    <td>Seed for random event shuffling</td>
</tr>
<tr>
    <td>NormMode</td>
    <td>no</td>
    <td>EqualNumEvents</td>
    <td>None, NumEvents,
EqualNumEvents</td>
    <td>Overall renormalisation of event-by-event
weights used in the training (NumEvents:
average weight of 1 per
event, independently for signal and
background; EqualNumEvents: average
weight of 1 per event for signal,
and sum of weights for background
equal to sum of weights for signal)</td>
</tr>

<tr>
    <td>nTrain_Signal</td>
    <td>no</td>
    <td>0 (all)</td>
    <td>-</td>
    <td>Number of training events of class Signal</td>
</tr>

<tr>
    <td>nTest_Signal</td>
    <td>no</td>
    <td>0 (all)</td>
    <td>-</td>
    <td>Number of test events of class Signal</td>
</tr>

<tr>
    <td>nTrain_Background</td>
    <td>no</td>
    <td>0 (all)</td>
    <td>-</td>
    <td>Number of training events of class
Background</td>
</tr>

<tr>
    <td>nTest_Background </td>
    <td>no</td>
    <td>0 (all)</td>
    <td>-</td>
    <td>Number of test events of class Background</td>
</tr>
<tr>
    <td>V</td>
    <td>no</td>
    <td>False</td>
    <td>-</td>
    <td>Verbosity</td>
</tr>
<tr>
    <td>VerboseLevel</td>
    <td>no</td>
    <td>Info</td>
    <td>Debug, Verbose,
Info</td>
    <td>Verbosity level</td>
</tr>

</table>

In [10]:
mycuts = TCut("")
mycutb = TCut("")

loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,
                    nTrain_Signal=0, nTrain_Background=0, SplitMode="Random", NormMode="NumEvents", V=False)

## Booking methods

To add which we want to train on dataset we have to use the Factory.BookMethod function. This method will add a method and it's options to Factory.

Arguments:
<table>

<tr>
    <th>Keyword</th>
    <th>Can be used as positional argument</th>
    <th>Default</th>
    <th>Predefined values</th>
    <th>Description</th>
</tr>

<tr>
    <td>DataLoader</td>
    <td>yes, 1.</td>
    <td>-</td>
    <td>-</td>
    <td>Pointer to DataLoader object</td>
</tr>

<tr>
    <td>Method</td>
    <td>yes, 2.</td>
    <td>-</td>
    <td>     kVariable
         kCuts           ,
         kLikelihood     ,
         kPDERS          ,
         kHMatrix        ,
         kFisher         ,
         kKNN            ,
         kCFMlpANN       ,
         kTMlpANN        ,
         kBDT            ,
         kDT             ,
         kRuleFit        ,
         kSVM            ,
         kMLP            ,
         kBayesClassifier,
         kFDA            ,
         kBoost          ,
         kPDEFoam        ,
         kLD             ,
         kPlugins        ,
         kCategory       ,
         kDNN            ,
         kPyRandomForest ,
         kPyAdaBoost     ,
         kPyGTB          ,
         kC50            ,
         kRSNNS          ,
         kRSVM           ,
         kRXGB           ,
         kMaxMethod</td>
    <td>Selected method number, method numbers defined in TMVA.Types</td>
</tr>
<tr>
    <td>MethodTitle</td>
    <td>yes, 3.</td>
    <td>-</td>
    <td>-</td>
    <td>Label for method</td>
</tr>
<tr>
    <td> * </td>
    <td> no </td>
    <td>-</td>
    <td>-</td>
    <td> Other named arguments which are the options for selected method. </td>
</tr>
</table>

In [11]:
factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kMLP, MethodTitle="MLP", 
        H=False, V=False, NeuronType="tanh", VarTransform="N", NCycles=600, HiddenLayers="N+5",
                   TestRate=5, UseRegulator=False )

<ROOT.TMVA::MethodMLP object ("MLP") at 0x4dbd0b0>

## Evaluate importance

To calculate variable importance we can use Factory.EvaluateImportance function. The parameters of this function are the following:


<table>

<tr>
    <th>Keyword</th>
    <th>Can be used as positional argument</th>
    <th>Default</th>
    <th>Predefined values</th>
    <th>Description</th>
</tr>


<tr>
    <td>DataLoader</td>
    <td>yes, 1.</td>
    <td>-</td>
    <td>-</td>
    <td>Pointer to DataLoader object</td>
</tr>
<tr>
    <td>VIType</td>
    <td>yes, 2.</td>
    <td>-</td>
    <td>-</td>
    <td>Variable Importance type</td>
</tr>


<tr>
    <td>Method</td>
    <td>yes, 3.</td>
    <td>-</td>
    <td>     kVariable
         kCuts           ,
         kLikelihood     ,
         kPDERS          ,
         kHMatrix        ,
         kFisher         ,
         kKNN            ,
         kCFMlpANN       ,
         kTMlpANN        ,
         kBDT            ,
         kDT             ,
         kRuleFit        ,
         kSVM            ,
         kMLP            ,
         kBayesClassifier,
         kFDA            ,
         kBoost          ,
         kPDEFoam        ,
         kLD             ,
         kPlugins        ,
         kCategory       ,
         kDNN            ,
         kPyRandomForest ,
         kPyAdaBoost     ,
         kPyGTB          ,
         kC50            ,
         kRSNNS          ,
         kRSVM           ,
         kRXGB           ,
         kMaxMethod</td>
    <td>Selected method number, method numbers defined in TMVA.Types</td>
</tr>
<tr>
    <td>MethodTitle</td>
    <td>yes, 4.</td>
    <td>-</td>
    <td>-</td>
    <td>Label for method</td>
</tr>

<tr>
    <td>V</td>
    <td>no</td>
    <td>False</td>
    <td>-</td>
    <td>Verbose</td>
</tr>

<tr>
    <td>NTrees</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>NTrees</td>
</tr>

<tr>
    <td>MinNodeSize</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>MinNodeSize</td>
</tr>
<tr>
    <td>MaxDepth</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>MaxDepth</td>
</tr>
<tr>
    <td>BoostType</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>BoostType</td>
</tr>
<tr>
    <td>AdaBoostBeta</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>AdaBoostBeta</td>
</tr>
<tr>
    <td>UseBaggedBoost</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>UseBaggedBoost</td>
</tr>
<tr>
    <td>BaggedSampleFraction</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td></td>
</tr>
<tr>
    <td>SeparationType</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td></td>
</tr>
<tr>
    <td>nCuts</td>
    <td>no</td>
    <td></td>
    <td></td>
    <td>nCuts</td>
</tr>
</table>

In [12]:
factory.EvaluateImportance(DataLoader=loader,VIType=0, Method=TMVA.Types.kBDT, MethodTitle="BDT",
            V=False,NTrees=5, MinNodeSize="2.5%",MaxDepth=2, BoostType="AdaBoost", AdaBoostBeta=0.5, 
            UseBaggedBoost=True, BaggedSampleFraction=0.5, SeparationType="GiniIndex", nCuts=20 );

0,1,2,3,4,5
,Evaluation results ranked by best signal efficiency and purity (area),,,,
,DataSet MVA,,,,
,Name: Method: ROC-integ,,,,
,00000000000000000000000000001111 BDT : 0.830,,,,
,Testing efficiency compared to training efficiency (overtraining check),,,,
,DataSet MVA Signal efficiency: from test sample (from training sample),,,,
,Name: Method: @B=0.01 @B=0.10 @B=0.30,,,,
,00000000000000000000000000001111 BDT : 0.000 (0.000) 0.000 (0.000) 0.866 (0.871),,,,
Factory,Evaluate classifier: MLP,,,,
Factory,,,,,

0,1,2,3,4,5
Variable,Mean,RMS,Min,Max,
myvar1,0.075113,0.36776,-1.1074,1.0251,
myvar2,0.0075595,0.27349,-0.90663,1.0008,
var3,0.070228,0.37106,-1.0649,1.0602,
var4,0.12090,0.39854,-1.1871,1.0199,

0,1,2
Dataset: tmva_class_example,Loop over test events and fill histograms with classifier response...,
Dataset: tmva_class_example,,
Dataset: tmva_class_example,,

0,1,2
Dataset: 00000000000000000000000000001110,"Added class ""Signal""",

0,1,2
Dataset: 00000000000000000000000000001110,"Added class ""Background""",

0,1,2
Dataset: 00000000000000000000000000001110,Number of events in input trees,
Dataset: 00000000000000000000000000001110,"Weight renormalisation mode: ""EqualNumEvents"": renormalises all event classes ...",
Dataset: 00000000000000000000000000001110,such that the effective (weighted) number of events in each class is the same,
Dataset: 00000000000000000000000000001110,(and equals the number of events (entries) given for class=0 ),
Dataset: 00000000000000000000000000001110,"... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...",
Dataset: 00000000000000000000000000001110,... (note that N_j is the sum of TRAINING events,
Dataset: 00000000000000000000000000001110,..... Testing events are not renormalised nor included in the renormalisation factor!),

0,1,2,3
Number of training and testing events,Number of training and testing events,Number of training and testing events,
Signal,training events,3000,
Signal,testing events,3000,
Signal,training and testing events,6000,
Background,training events,3000,
Background,testing events,3000,
Background,training and testing events,6000,

0,1
Dataset: 00000000000000000000000000001110,
Dataset: 00000000000000000000000000001110,

0,1,2
Dataset: 00000000000000000000000000001110,Evaluation of BDT on training sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001110,Evaluation of BDT on testing sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001110,Loop over test events and fill histograms with classifier response...,
Dataset: 00000000000000000000000000001110,,
Dataset: 00000000000000000000000000001110,,

0,1,2
Dataset: 00000000000000000000000000001101,"Added class ""Signal""",

0,1,2
Dataset: 00000000000000000000000000001101,"Added class ""Background""",

0,1,2
Dataset: 00000000000000000000000000001101,Number of events in input trees,
Dataset: 00000000000000000000000000001101,"Weight renormalisation mode: ""EqualNumEvents"": renormalises all event classes ...",
Dataset: 00000000000000000000000000001101,such that the effective (weighted) number of events in each class is the same,
Dataset: 00000000000000000000000000001101,(and equals the number of events (entries) given for class=0 ),
Dataset: 00000000000000000000000000001101,"... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...",
Dataset: 00000000000000000000000000001101,... (note that N_j is the sum of TRAINING events,
Dataset: 00000000000000000000000000001101,..... Testing events are not renormalised nor included in the renormalisation factor!),

0,1,2,3
Number of training and testing events,Number of training and testing events,Number of training and testing events,
Signal,training events,3000,
Signal,testing events,3000,
Signal,training and testing events,6000,
Background,training events,3000,
Background,testing events,3000,
Background,training and testing events,6000,

0,1
Dataset: 00000000000000000000000000001101,
Dataset: 00000000000000000000000000001101,

0,1,2
Dataset: 00000000000000000000000000001101,Evaluation of BDT on training sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001101,Evaluation of BDT on testing sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001101,Loop over test events and fill histograms with classifier response...,
Dataset: 00000000000000000000000000001101,,
Dataset: 00000000000000000000000000001101,,

0,1,2
Dataset: 00000000000000000000000000001011,"Added class ""Signal""",

0,1,2
Dataset: 00000000000000000000000000001011,"Added class ""Background""",

0,1,2
Dataset: 00000000000000000000000000001011,Number of events in input trees,
Dataset: 00000000000000000000000000001011,"Weight renormalisation mode: ""EqualNumEvents"": renormalises all event classes ...",
Dataset: 00000000000000000000000000001011,such that the effective (weighted) number of events in each class is the same,
Dataset: 00000000000000000000000000001011,(and equals the number of events (entries) given for class=0 ),
Dataset: 00000000000000000000000000001011,"... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...",
Dataset: 00000000000000000000000000001011,... (note that N_j is the sum of TRAINING events,
Dataset: 00000000000000000000000000001011,..... Testing events are not renormalised nor included in the renormalisation factor!),

0,1,2,3
Number of training and testing events,Number of training and testing events,Number of training and testing events,
Signal,training events,3000,
Signal,testing events,3000,
Signal,training and testing events,6000,
Background,training events,3000,
Background,testing events,3000,
Background,training and testing events,6000,

0,1
Dataset: 00000000000000000000000000001011,
Dataset: 00000000000000000000000000001011,

0,1,2
Dataset: 00000000000000000000000000001011,Evaluation of BDT on training sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001011,Evaluation of BDT on testing sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000001011,Loop over test events and fill histograms with classifier response...,
Dataset: 00000000000000000000000000001011,,
Dataset: 00000000000000000000000000001011,,

0,1,2
Dataset: 00000000000000000000000000000111,"Added class ""Signal""",

0,1,2
Dataset: 00000000000000000000000000000111,"Added class ""Background""",

0,1,2
Dataset: 00000000000000000000000000000111,Number of events in input trees,
Dataset: 00000000000000000000000000000111,"Weight renormalisation mode: ""EqualNumEvents"": renormalises all event classes ...",
Dataset: 00000000000000000000000000000111,such that the effective (weighted) number of events in each class is the same,
Dataset: 00000000000000000000000000000111,(and equals the number of events (entries) given for class=0 ),
Dataset: 00000000000000000000000000000111,"... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...",
Dataset: 00000000000000000000000000000111,... (note that N_j is the sum of TRAINING events,
Dataset: 00000000000000000000000000000111,..... Testing events are not renormalised nor included in the renormalisation factor!),

0,1,2,3
Number of training and testing events,Number of training and testing events,Number of training and testing events,
Signal,training events,3000,
Signal,testing events,3000,
Signal,training and testing events,6000,
Background,training events,3000,
Background,testing events,3000,
Background,training and testing events,6000,

0,1
Dataset: 00000000000000000000000000000111,
Dataset: 00000000000000000000000000000111,

0,1,2
Dataset: 00000000000000000000000000000111,Evaluation of BDT on training sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000000111,Evaluation of BDT on testing sample (6000 events),

0,1,2
Dataset: 00000000000000000000000000000111,Loop over test events and fill histograms with classifier response...,
Dataset: 00000000000000000000000000000111,,
Dataset: 00000000000000000000000000000111,,
