<img src="http://oproject.org/img/PYMVA.png" height="30%" width="30%">

<hr style="border-top-width: 4px; border-top-color: #34609b;">

# Binary classification with PyMVA

In [1]:
import ROOT

Welcome to JupyROOT 6.09/01


In [2]:
# Select Theano as backend for Keras
from os import environ
environ['KERAS_BACKEND'] = 'theano'

# Set architecture of system (AVX instruction set is not supported on SWAN)
environ['THEANO_FLAGS'] = 'gcc.cxxflags=-march=corei7'

from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras.optimizers import Adam

Using Theano backend.


## Load data

In [3]:
# Open file
data = ROOT.TFile.Open('https://raw.githubusercontent.com/iml-wg/tmvatutorials/master/inputdata.root')

# Get signal and background trees from file
signal = data.Get('TreeS')
background = data.Get('TreeB')

# Add variables to dataloader
dataloader = ROOT.TMVA.DataLoader('dataset_pymva')
numVariables = len(signal.GetListOfBranches())
for branch in signal.GetListOfBranches():
    dataloader.AddVariable(branch.GetName())

# Add trees to dataloader
dataloader.AddSignalTree(signal, 1.0)
dataloader.AddBackgroundTree(background, 1.0)
trainTestSplit = 0.8
dataloader.PrepareTrainingAndTestTree(ROOT.TCut(''),
        'TrainTestSplit_Signal={}:'.format(trainTestSplit)+\
        'TrainTestSplit_Background={}:'.format(trainTestSplit)+\
        'SplitMode=Random')

DataSetInfo              : [dataset_pymva] : Added class "Signal"
                         : Add Tree TreeS of type Signal with 6000 events
DataSetInfo              : [dataset_pymva] : Added class "Background"
                         : Add Tree TreeB of type Background with 6000 events
                         : Dataset[dataset_pymva] : Class index : 0  name : Signal
                         : Dataset[dataset_pymva] : Class index : 1  name : Background


## Set up TMVA

In [4]:
# Setup TMVA
ROOT.TMVA.Tools.Instance()
ROOT.TMVA.PyMethodBase.PyInitialize()

outputFile = ROOT.TFile.Open('TMVAOutputPyMVA.root', 'RECREATE')
factory = ROOT.TMVA.Factory('TMVAClassification', outputFile,
        '!V:!Silent:Color:DrawProgressBar:Transformations=I,G:'+\
        'AnalysisType=Classification')

## Define model for Keras

In [5]:
# Define model
model = Sequential()
model.add(Dense(32, init='glorot_normal', activation='relu',
        input_dim=numVariables))
model.add(Dropout(0.5))
model.add(Dense(32, init='glorot_normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, init='glorot_uniform', activation='softmax'))

# Set loss and optimizer
model.compile(loss='categorical_crossentropy', optimizer=Adam(),
        metrics=['categorical_accuracy',])

# Store model to file
model.save('model.h5')

# Print summary of model
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dense_1 (Dense)                  (None, 32)            160         dense_input_1[0][0]              
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 32)            0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 32)            1056        dropout_1[0][0]                  
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 32)            0           dense_2[0][0]                    
___________________________________________________________________________________________

         It is better to let Theano/g++ find it automatically, but we don't do it now
         It is better to let Theano/g++ find it automatically, but we don't do it now


## Book methods

Just run the cells that contain the classifiers you want to try.

In [6]:
# Keras interface with previously defined model
factory.BookMethod(dataloader, ROOT.TMVA.Types.kPyKeras, 'PyKeras',
        'H:!V:VarTransform=G:FilenameModel=model.h5:'+\
        'NumEpochs=10:BatchSize=32:'+\
        'TriesEarlyStopping=3')

<ROOT.TMVA::MethodPyKeras object ("PyKeras") at 0x77e48b0>

Factory                  : Booking method: [1mPyKeras[0m
                         : 
PyKeras                  : [dataset_pymva] : Create Transformation "G" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
                         : Load model from file: model.h5


In [7]:
# Gradient tree boosting from scikit-learn package
factory.BookMethod(dataloader, ROOT.TMVA.Types.kPyGTB, 'GTB',
        'H:!V:VarTransform=None:'+\
        'NEstimators=100:LearningRate=0.1:MaxDepth=3')

<ROOT.TMVA::MethodPyGTB object ("GTB") at 0x77c0a30>

Factory                  : Booking method: [1mGTB[0m
                         : 
DataSetFactory           : [dataset_pymva] : Number of events in input trees
                         : 
                         : 
                         : Dataset[dataset_pymva] : Weight renormalisation mode: "EqualNumEvents": renormalises all event classes ...
                         : Dataset[dataset_pymva] :  such that the effective (weighted) number of events in each class is the same 
                         : Dataset[dataset_pymva] :  (and equals the number of events (entries) given for class=0 )
                         : Dataset[dataset_pymva] : ... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ...
                         : Dataset[dataset_pymva] : ... (note that N_j is the sum of TRAINING events
                         : Dataset[dataset_pymva] :  ..... Testing events are not renormalised nor included in the renormalisation factor!)
                         : Number of 



## Run training, testing and evaluation

In [8]:
factory.TrainAllMethods()

Factory                  : [1mTrain all methods[0m
Factory                  : [dataset_pymva] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
Factory                  : [dataset_pymva] : Create Transformation "G" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : v

  if __name__ == '__main__':
  if __name__ == '__main__':


In [9]:
factory.TestAllMethods()

Factory                  : [1mTest all methods[0m
Factory                  : Test method: PyKeras for Classification performance
                         : 
                         : Load model from file: dataset_pymva/weights/TrainedModel_PyKeras.h5
Factory                  : Test method: GTB for Classification performance
                         : 
                         : 
                         : [1m--- Loading State File From:[0mdataset_pymva/weights/PyGTBModel.PyData
                         : 
                         : Dataset[dataset_pymva] : Evaluation of GTB on testing sample (2400 events)
                         : Dataset[dataset_pymva] : Elapsed time for evaluation of 2400 events: 0.00952 sec       


  if __name__ == '__main__':
  if __name__ == '__main__':


In [10]:
factory.EvaluateAllMethods()

Factory                  : [1mEvaluate all methods[0m
Factory                  : Evaluate classifier: PyKeras
                         : 
TFHandler_PyKeras        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:  -0.019674     1.0126   [    -2.8208     5.7307 ]
                         :     var2:  -0.025370    0.99752   [    -3.1672     5.7307 ]
                         :     var3:  -0.025914     1.0079   [    -3.0141     5.7307 ]
                         :     var4:  -0.023154     1.0059   [    -2.9557     5.7307 ]
                         : -----------------------------------------------------------
PyKeras                  : [dataset_pymva] : Loop over test events and fill histograms with classifier response...
                         : 
TFHandler_PyKeras        : Variable        Mean        RMS   [        Min        Max ]
                     

  if __name__ == '__main__':
  if __name__ == '__main__':


## Print ROC

In [11]:
# Enable Javascript for ROOT so that we can draw the canvas
%jsroot on

# Print ROC
canvas = factory.GetROCCurve(dataloader)
canvas.Draw()