{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enable ipywidgets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To be able to visualize decision trees and DNN weight map, you must enable ipywidgets. To do so, run the following cell, and refresh the page!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!jupyter nbextension enable --py widgetsnbextension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", "require(['notebook'],\n", " function() {\n", " IPython.CodeCell.config_defaults.highlight_modes['magic_text/x-c++src'] = {'reg':[/^%%cpp/]};\n", " console.log(\"JupyROOT - %%cpp magic configured\");\n", " }\n", ");\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Welcome to JupyROOT 6.09/01\n" ] } ], "source": [ "import ROOT\n", "from ROOT import TFile, TMVA, TCut" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Enable JS visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use new interactive features in notebook we have to enable a module called JsMVA. This can be done by using ipython magic: %jsmva." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%jsmva on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Declaration of Factory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's start with the classical version of declaration. If you know how to use TMVA in C++ then you can use that version here in python: first we need to pass a string called job name, as second argument we need to pass an opened output TFile (this is optional, if it's present then it will be used to store output histograms) and as third (or second) argument we pass a string which contains all the settings related to Factory (separated with ':' character)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## C++ like declaration" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "outputFile = TFile( \"TMVA.root\", 'RECREATE' )\n", "TMVA.Tools.Instance();\n", "\n", "factory = TMVA.Factory( \"TMVAClassification\", outputFile #this is optional\n", " ,\"!V:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
OptionDefaultPredefined valuesDescription
VFalse-Verbose flag
ColorTrue-Flag for colored output
Transformations\"\"-List of transformations to test. For example with \"I;D;P;U;G\" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied
SilentFalse-Batch mode: boolean silent flag inhibiting\n", "any output from TMVA after\n", "the creation of the factory class object
DrawProgressBarTrue-Draw progress bar to display training,\n", "testing and evaluation schedule (default:\n", "True)
AnalysisTypeAutoClassification,\n", "Regression,\n", "Multiclass, AutoSet the analysis type
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pythonic version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By enabling JsMVA we have new, more readable ways to do the declaration (this applies to all functions, not just to the constructor)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "factory = TMVA.Factory(\"TMVAClassification\", TargetFile=outputFile,\n", " V=False, Color=True, DrawProgressBar=True, Transformations=[\"I\", \"D\", \"P\", \"G\", \"D\"],\n", " AnalysisType=\"Classification\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arguments of constructor:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
JobNameyes, 1.not optional-Name of job
TargetFileyes, 2.if not passed histograms won't be saved-File to write control and performance histograms histograms
VnoFalse-Verbose flag
ColornoTrue-Flag for colored output
Transformationsno\"\"-List of transformations to test. For example with \"I;D;P;U;G\" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied
SilentnoFalse-Batch mode: boolean silent flag inhibiting\n", "any output from TMVA after\n", "the creation of the factory class object
DrawProgressBarnoTrue-Draw progress bar to display training,\n", "testing and evaluation schedule (default:\n", "True)
AnalysisTypenoAutoClassification,\n", "Regression,\n", "Multiclass, AutoSet the analysis type
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Declaring the DataLoader, adding variables and setting up the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we need to declare a DataLoader and add the variables (passing the variable names used in the test and train trees in input dataset). To add variable names to DataLoader we use the AddVariable function. Arguments of this function:\n", "\n", "1. String containing the variable name. Using \":=\" we can add definition too.\n", "\n", "2. String (label to variable, if not present the variable name will be used) or character (defining the type of data points)\n", "\n", "3. If we have label for variable, the data point type still can be passed as third argument " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dataset = \"tmva_class_example\" #the dataset name\n", "loader = TMVA.DataLoader(dataset)\n", "\n", "loader.AddVariable( \"myvar1 := var1+var2\", 'F' )\n", "loader.AddVariable( \"myvar2 := var1-var2\", \"Expression 2\", 'F' )\n", "loader.AddVariable( \"var3\", \"Variable 3\", 'F' )\n", "loader.AddVariable( \"var4\", \"Variable 4\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to define spectator variables, which are part of the input data set, but which are not\n", "used in the MVA training, test nor during the evaluation, but can be used for correlation tests or others. \n", "Parameters:\n", "\n", "1. String containing the definition of spectator variable.\n", "2. Label for spectator variable.\n", "3. Data type" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "loader.AddSpectator( \"spec1:=var1*2\", \"Spectator 1\", 'F' )\n", "loader.AddSpectator( \"spec2:=var1*3\", \"Spectator 2\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After adding the variables we have to add the datas to DataLoader. In order to do this we check if the dataset file doesn't exist in files directory we download from CERN's server. When we have the root file we open it and get the signal and background trees." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if ROOT.gSystem.AccessPathName( \"tmva_class_example.root\" ) != 0: \n", " ROOT.gSystem.Exec( \"wget https://root.cern.ch/files/tmva_class_example.root\")\n", " \n", "input = TFile.Open( \"tmva_class_example.root\" )\n", "\n", "# Get the signal and background trees for training\n", "signal = input.Get( \"TreeS\" )\n", "background = input.Get( \"TreeB\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To pass the signal and background trees to DataLoader we use the AddSignalTree and AddBackgroundTree functions, and we set up the corresponding DataLoader variable's too.\n", "Arguments of functions:\n", "\n", "1. Signal/Background tree\n", "2. Global weight used in all events in the tree." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
DataSetInfo
Dataset: tmva_class_exampleAdded class \"Signal\"
Add Tree TreeS of type Signal with 6000 events
DataSetInfo
Dataset: tmva_class_exampleAdded class \"Background\"
Add Tree TreeB of type Background with 6000 events
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Global event weights (see below for setting event-wise weights)\n", "signalWeight = 1.0\n", "backgroundWeight = 1.0\n", "\n", "loader.AddSignalTree(signal, signalWeight)\n", "loader.AddBackgroundTree(background, backgroundWeight)\n", "\n", "loader.fSignalWeight = signalWeight\n", "loader.fBackgroundWeight = backgroundWeight\n", "loader.fTreeS = signal\n", "loader.fTreeB = background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With using DataLoader.PrepareTrainingAndTestTree function we apply cuts on input events. In C++ this function also needs to add the options as a string (as we seen in Factory constructor) which with JsMVA can be passed (same as Factory constructor case) as keyword arguments.\n", "\n", "Arguments of PrepareTrainingAndTestTree:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
SigCutyes, 1.--TCut object for signal cut
Bkgyes, 2.--TCut object for background cut
SplitModenoRandomRandom,\n", "Alternate,\n", "BlockMethod of picking training and testing\n", "events
MixModenoSameAsSplitModeSameAsSplitMode,\n", "Random,\n", "Alternate,\n", "BlockMethod of mixing events of differnt\n", "classes into one dataset
SplitSeedno100-Seed for random event shuffling
NormModenoEqualNumEventsNone, NumEvents,\n", "EqualNumEventsOverall renormalisation of event-by-event\n", "weights used in the training (NumEvents:\n", "average weight of 1 per\n", "event, independently for signal and\n", "background; EqualNumEvents: average\n", "weight of 1 per event for signal,\n", "and sum of weights for background\n", "equal to sum of weights for signal)
nTrain_Signalno0 (all)-Number of training events of class Signal
nTest_Signalno0 (all)-Number of test events of class Signal
nTrain_Backgroundno0 (all)-Number of training events of class\n", "Background
nTest_Background no0 (all)-Number of test events of class Background
VnoFalse-Verbosity
VerboseLevelnoInfoDebug, Verbose,\n", "InfoVerbosity level
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mycuts = TCut(\"\")\n", "mycutb = TCut(\"\")\n", "\n", "loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,\n", " nTrain_Signal=0, nTrain_Background=0, SplitMode=\"Random\", NormMode=\"NumEvents\", V=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing input variables" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DataSetFactory
Dataset: tmva_class_exampleNumber of events in input trees
Number of training and testing events
Signaltraining events3000
testing events3000
training and testing events6000
Backgroundtraining events3000
testing events3000
training and testing events6000
DataSetInfo Correlation matrix (Signal)
DataSetInfo Correlation matrix (Background)
DataSetFactory
Dataset: tmva_class_example
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "loader.DrawInputVariable(\"myvar1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## We can also visualize transformations on input variables" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
DataLoader
Dataset: tmva_class_exampleCreate Transformation \"D\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
DataLoader
Dataset: tmva_class_exampleCreate Transformation \"N\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
Preparing the Decorrelation transformation...
TFHandler_DataLoader
VariableMeanRMSMinMax
myvar1-0.112021.0000-3.88133.3150
myvar2-0.0174041.0000-3.72403.6440
var3-0.112411.0000-3.72483.8805
var40.322611.0000-3.36623.1355
TFHandler_DataLoader
VariableMeanRMSMinMax
myvar10.0475640.27792-1.00001.0000
myvar20.00612620.27145-1.00001.0000
var3-0.0500400.26298-1.00001.0000
var40.134720.30761-1.00001.0000
DataLoader
Dataset: tmva_class_exampleCreate Transformation \"D\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
DataLoader
Dataset: tmva_class_exampleCreate Transformation \"N\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
Preparing the Decorrelation transformation...
TFHandler_DataLoader
VariableMeanRMSMinMax
myvar1-0.112021.0000-3.88133.3150
myvar2-0.0174041.0000-3.72403.6440
var3-0.112411.0000-3.72483.8805
var40.322611.0000-3.36623.1355
TFHandler_DataLoader
VariableMeanRMSMinMax
myvar10.0475640.27792-1.00001.0000
myvar20.00612620.27145-1.00001.0000
var3-0.0500400.26298-1.00001.0000
var40.134720.30761-1.00001.0000
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "loader.DrawInputVariable(\"myvar1\", processTrfs=[\"D\", \"N\"]) #Transformations: I;N;D;P;U;G,D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Correlation matrix of input variables" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "loader.DrawCorrelationMatrix(\"Signal\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Booking methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add which we want to train on dataset we have to use the Factory.BookMethod function. This method will add a method and it's options to Factory.\n", "\n", "Arguments:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
DataLoaderyes, 1.--Pointer to DataLoader object
Methodyes, 2.- kVariable\n", " kCuts ,\n", " kLikelihood ,\n", " kPDERS ,\n", " kHMatrix ,\n", " kFisher ,\n", " kKNN ,\n", " kCFMlpANN ,\n", " kTMlpANN ,\n", " kBDT ,\n", " kDT ,\n", " kRuleFit ,\n", " kSVM ,\n", " kMLP ,\n", " kBayesClassifier,\n", " kFDA ,\n", " kBoost ,\n", " kPDEFoam ,\n", " kLD ,\n", " kPlugins ,\n", " kCategory ,\n", " kDNN ,\n", " kPyRandomForest ,\n", " kPyAdaBoost ,\n", " kPyGTB ,\n", " kC50 ,\n", " kRSNNS ,\n", " kRSVM ,\n", " kRXGB ,\n", " kMaxMethodSelected method number, method numbers defined in TMVA.Types
MethodTitleyes, 3.--Label for method
* no -- Other named arguments which are the options for selected method.
" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Factory Booking method: SVM\u001b
SVM
Dataset: tmva_class_exampleCreate Transformation \"Norm\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
Factory Booking method: MLP\u001b
MLP
Dataset: tmva_class_exampleCreate Transformation \"N\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
MLP Building Network.
Initializing weights
Factory Booking method: LD\u001b
Factory Booking method: Likelihood\u001b
Factory Booking method: BDT\u001b
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kSVM, MethodTitle=\"SVM\", \n", " Gamma=0.25, Tol=0.001, VarTransform=\"Norm\" )\n", "\n", "factory.BookMethod( loader,TMVA.Types.kMLP, \"MLP\", \n", " H=False, V=False, NeuronType=\"tanh\", VarTransform=\"N\", NCycles=600, HiddenLayers=\"N+5\",\n", " TestRate=5, UseRegulator=False )\n", "\n", "factory.BookMethod( loader,TMVA.Types.kLD, \"LD\", \n", " H=False, V=False, VarTransform=\"None\", CreateMVAPdfs=True, PDFInterpolMVAPdf=\"Spline2\",\n", " NbinsMVAPdf=50, NsmoothMVAPdf=10 )\n", "\n", "factory.BookMethod( loader,TMVA.Types.kLikelihood,\"Likelihood\",\"NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10\",\n", " NSmooth=1, NAvEvtPerBin=50, H=True, V=False,TransformOutput=True,PDFInterpol=\"Spline2\")\n", "\n", "factory.BookMethod( loader, TMVA.Types.kBDT, \"BDT\",\n", " H=False, V=False, NTrees=850, MinNodeSize=\"2.5%\", MaxDepth=3, BoostType=\"AdaBoost\", AdaBoostBeta=0.5,\n", " UseBaggedBoost=True, BaggedSampleFraction=0.5, SeparationType=\"GiniIndex\", nCuts=20 )" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Booking DNN: 2 ways (don't use both in the same time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is two way to book DNN:\n", "\n", "1) The visual way: run the next cell, and design the network graphically and then click on \"Save Network\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.BookDNN(loader)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2) Classical way" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Factory Booking method: DNN\u001b
DNN
Dataset: tmva_class_exampleCreate Transformation \"Normalize\" with events from all classes.
Transformation, Variable selection :
Input : variable 'myvar1' <---> Output : variable 'myvar1'
Input : variable 'myvar2' <---> Output : variable 'myvar2'
Input : variable 'var3' <---> Output : variable 'var3'
Input : variable 'var4' <---> Output : variable 'var4'
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trainingStrategy = [{\n", " \"LearningRate\": 1e-1,\n", " \"Momentum\": 0.0,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 20,\n", " \"TestRepetitions\": 15,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"DropConfig\": \"0.0+0.5+0.5+0.5\",\n", " \"DropRepetitions\": 1,\n", " \"Multithreading\": True\n", " \n", " }, {\n", " \"LearningRate\": 1e-2,\n", " \"Momentum\": 0.5,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 30,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"L2\",\n", " \"DropConfig\": \"0.0+0.1+0.1+0.1\",\n", " \"DropRepetitions\": 1,\n", " \"Multithreading\": True\n", " \n", " }, {\n", " \"LearningRate\": 1e-2,\n", " \"Momentum\": 0.3,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 40,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"L2\",\n", " \"Multithreading\": True\n", " \n", " },{\n", " \"LearningRate\": 1e-3,\n", " \"Momentum\": 0.1,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 200,\n", " \"BatchSize\": 70,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"Multithreading\": True\n", " \n", "}, {\n", " \"LearningRate\": 1e-3,\n", " \"Momentum\": 0.1,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 200,\n", " \"BatchSize\": 70,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"Multithreading\": True\n", " \n", "}]\n", "\n", "factory.BookMethod(DataLoader=loader, Method=TMVA.Types.kDNN, MethodTitle=\"DNN\", \n", " H = False, V=False, VarTransform=\"Normalize\", ErrorStrategy=\"CROSSENTROPY\",\n", " Layout=[\"TANH|100\", \"TANH|50\", \"TANH|10\", \"LINEAR\"],\n", " TrainingStrategy=trainingStrategy,Architecture=\"STANDARD\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Train Methods\n", "\n", "When you use the jsmva magic, the original C++ version of Factory::TrainAllMethods is rewritten by a new training method, which will produce notebook compatible output during the training, so we can trace the process (progress bar, error plot). For some methods (MLP, DNN, BDT) there will be created a tracer plot (for MLP, DNN test and training error vs epoch, for BDT error fraction and boost weight vs tree number). There are also some method which doesn't support interactive tracing, so for these methods just a simple text will be printed, just to we know that TrainAllMethods function is training this method currently.\n", "\n", "For methods where is possible to trace the training interactively there is a stop button, which can stop the training process. This button just stops the training of the current method, and doesn't stop the TrainAllMethods completely. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "

Dataset: tmva_class_example

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: SVM

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", "
\n", "
\n", "
0%
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: MLP

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", "
\n", "
\n", "
0%
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: LD

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Training..." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "End" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: Likelihood

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Training..." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "End" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: BDT

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", "
\n", "
\n", "
0%
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Train method: DNN

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", "
\n", "
\n", "
0%
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TFHandler_SVM
VariableMeanRMSMinMax
myvar10.0839890.36407-1.00001.0000
myvar20.00947780.27696-1.00001.0000
var30.0802790.36720-1.00001.0000
var40.129860.39603-1.00001.0000
Building SVM Working Set...with 6000 event instances
Elapsed time for Working Set build : 1.24 sec
Sorry, no computing time forecast available for SVM, please wait ...
Elapsed time : 1.68 sec
Elapsed time for training with 6000 events : 2.94 sec
SVM
Dataset: tmva_class_exampleEvaluation of SVM on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 1.03 sec
Creating xml weight file: tmva_class_example/weights/TMVAClassification_SVM.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_SVM.class.C\u001b
TFHandler_MLP
VariableMeanRMSMinMax
myvar10.0839890.36407-1.00001.0000
myvar20.00947780.27696-1.00001.0000
var30.0802790.36720-1.00001.0000
var40.129860.39603-1.00001.0000
Training Network
Elapsed time for training with 6000 events : 1.43 sec
MLP
Dataset: tmva_class_exampleEvaluation of MLP on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00932 sec
Creating xml weight file: tmva_class_example/weights/TMVAClassification_MLP.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_MLP.class.C\u001b
Write special histos to file: TMVA.root:/tmva_class_example/Method_MLP/MLP
LD Results for LD coefficients:
Variable: Coefficient:
myvar1: -0.359
myvar2: -0.109
var3: -0.211
var4: +0.722
(offset): -0.054
Elapsed time for training with 6000 events : 0.00231 sec
LD
Dataset: tmva_class_exampleEvaluation of LD on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.000759 sec
Dataset: tmva_class_example Separation from histogram (PDF): 0.452 (0.000)
Evaluation of LD on training sample
Creating xml weight file: tmva_class_example/weights/TMVAClassification_LD.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_LD.class.C\u001b
================================================================\u001b
Dataset: Likelihood \u001b[0m
--- Short description:\u001b
The maximum-likelihood classifier models the data with probability
density functions (PDF) reproducing the signal and background
distributions of the input variables. Correlations among the
variables are ignored.
--- Performance optimisation:\u001b
Required for good performance are decorrelated input variables
(PCA transformation via the option \"VarTransform=Decorrelate\"
may be tried). Irreducible non-linear correlations may be reduced
by precombining strongly correlated input variables, or by simply
removing one of the variables.
--- Performance tuning via configuration options:\u001b
High fidelity PDF estimates are mandatory, i.e., sufficient training
statistics is required to populate the tails of the distributions
It would be a surprise if the default Spline or KDE kernel parameters
provide a satisfying fit to the data. The user is advised to properly
tune the events per bin and smooth options in the spline cases
individually per variable. If the KDE kernel is used, the adaptive
Gaussian kernel may lead to artefacts, so please always also try
the non-adaptive one.
All tuning parameters must be adjusted individually for each input
variable!
================================================================\u001b
Filling reference histograms
Building PDF out of reference histograms
Elapsed time for training with 6000 events : 0.0304 sec
Likelihood
Dataset: tmva_class_exampleEvaluation of Likelihood on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00743 sec
Creating xml weight file: tmva_class_example/weights/TMVAClassification_Likelihood.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_Likelihood.class.C\u001b
Write monitoring histograms to file: TMVA.root:/tmva_class_example/Method_Likelihood/Likelihood
BDT #events: (reweighted) sig: 3000 bkg: 3000
#events: (unweighted) sig: 3000 bkg: 3000
Training 850 Decision Trees ... patience please
Elapsed time for training with 6000 events : 1.54 sec
BDT
Dataset: tmva_class_exampleEvaluation of BDT on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.443 sec
Creating xml weight file: tmva_class_example/weights/TMVAClassification_BDT.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_BDT.class.C\u001b
TFHandler_DNN
VariableMeanRMSMinMax
myvar10.0839890.36407-1.00001.0000
myvar20.00947780.27696-1.00001.0000
var30.0802790.36720-1.00001.0000
var40.129860.39603-1.00001.0000
TFHandler_DNN
VariableMeanRMSMinMax
myvar10.0839890.36407-1.00001.0000
myvar20.00947780.27696-1.00001.0000
var30.0802790.36720-1.00001.0000
var40.129860.39603-1.00001.0000
TFHandler_DNN
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
Using Standard Implementation.Training with learning rate = 0.1, momentum = 0, repetitions = 1
Training with learning rate = 0.01, momentum = 0.5, repetitions = 1
Training with learning rate = 0.01, momentum = 0.3, repetitions = 1
Training with learning rate = 0.001, momentum = 0.1, repetitions = 1
Training with learning rate = 0.001, momentum = 0.1, repetitions = 1
Elapsed time for training with 6000 events : 4.53 sec
DNN
Dataset: tmva_class_exampleEvaluation of DNN on training sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.212 sec
Creating xml weight file: tmva_class_example/weights/TMVAClassification_DNN.weights.xml\u001b
Creating standalone class: tmva_class_example/weights/TMVAClassification_DNN.class.C\u001b
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.TrainAllMethods()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Test end evaluate the methods\n", "\n", "To test test the methods and evaluate the performance we need to run Factory.TestAllMethods and Factory.EvaluateAllMethods functions." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
Factory Test all methods\u001b
Factory Test method: SVM for Classification performance
SVM
Dataset: tmva_class_exampleEvaluation of SVM on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.983 sec
Factory Test method: MLP for Classification performance
MLP
Dataset: tmva_class_exampleEvaluation of MLP on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00927 sec
Factory Test method: LD for Classification performance
LD
Dataset: tmva_class_exampleEvaluation of LD on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00108 sec
Dataset: tmva_class_exampleEvaluation of LD on testing sample
Factory Test method: Likelihood for Classification performance
Likelihood
Dataset: tmva_class_exampleEvaluation of Likelihood on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.00623 sec
Factory Test method: BDT for Classification performance
BDT
Dataset: tmva_class_exampleEvaluation of BDT on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.367 sec
Factory Test method: DNN for Classification performance
DNN
Dataset: tmva_class_exampleEvaluation of DNN on testing sample (6000 events)
Elapsed time for evaluation of 6000 events : 0.193 sec
Factory Evaluate all methods\u001b
Factory Evaluate classifier: SVM
TFHandler_SVM
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
SVM
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
TFHandler_SVM
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
Factory Evaluate classifier: MLP
TFHandler_MLP
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
MLP
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
TFHandler_MLP
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
Factory Evaluate classifier: LD
LD
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
Also filling probability and rarity histograms (on request)...
TFHandler_LD
VariableMeanRMSMinMax
myvar1-0.0108143.0633-9.86057.9024
myvar20.000905521.1092-3.70674.0291
var3-0.0151181.7459-5.35634.6430
var40.143312.1667-6.96755.0307
Factory Evaluate classifier: Likelihood
Likelihood
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
TFHandler_Likelihood
VariableMeanRMSMinMax
myvar1-0.0108143.0633-9.86057.9024
myvar20.000905521.1092-3.70674.0291
var3-0.0151181.7459-5.35634.6430
var40.143312.1667-6.96755.0307
Factory Evaluate classifier: BDT
BDT
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
TFHandler_BDT
VariableMeanRMSMinMax
myvar1-0.0108143.0633-9.86057.9024
myvar20.000905521.1092-3.70674.0291
var3-0.0151181.7459-5.35634.6430
var40.143312.1667-6.96755.0307
Factory Evaluate classifier: DNN
DNN
Dataset: tmva_class_exampleLoop over test events and fill histograms with classifier response...
TFHandler_DNN
VariableMeanRMSMinMax
myvar10.0751130.36776-1.10741.0251
myvar20.00755950.27349-0.906631.0008
var30.0702280.37106-1.06491.0602
var40.120900.39854-1.18711.0199
Evaluation results ranked by best signal efficiency and purity (area)
DataSet MVA
Name: Method: ROC-integ
tmva_class_example DNN : 0.940
tmva_class_example MLP : 0.939
tmva_class_example SVM : 0.937
tmva_class_example BDT : 0.931
tmva_class_example LD : 0.895
tmva_class_example Likelihood : 0.827
Testing efficiency compared to training efficiency (overtraining check)
DataSet MVA Signal efficiency: from test sample (from training sample)
Name: Method: @B=0.01 @B=0.10 @B=0.30
tmva_class_example DNN : 0.390 (0.345) 0.804 (0.798) 0.962 (0.963)
tmva_class_example MLP : 0.365 (0.345) 0.806 (0.797) 0.962 (0.964)
tmva_class_example SVM : 0.400 (0.322) 0.802 (0.791) 0.961 (0.961)
tmva_class_example BDT : 0.350 (0.380) 0.778 (0.805) 0.955 (0.959)
tmva_class_example LD : 0.261 (0.242) 0.679 (0.662) 0.901 (0.903)
tmva_class_example Likelihood : 0.106 (0.101) 0.400 (0.371) 0.812 (0.813)
Dataset:tmva_class_exa...: Created tree 'TestTree' with 6000 events
Dataset:tmva_class_exa...: Created tree 'TrainTree' with 6000 events
Factory Thank you for using TMVA!\u001b
For citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.TestAllMethods()\n", "factory.EvaluateAllMethods()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classifier Output Distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To draw the classifier output distribution we have to use Factory.DrawOutputDistribution function which is inserted by invoking jsmva magic. The parameters of the function are the following:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
methodNameyes, 2.-- The name of method
" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawOutputDistribution(dataset, \"MLP\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classifier Probability Distributions\n", "\n", "To draw the classifier probability distribution we have to use Factory.DrawProbabilityDistribution function which is inserted by invoking jsmva magic. The parameters of the function are the following:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawProbabilityDistribution(dataset, \"LD\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ROC curve\n", "\n", "To draw the ROC (receiver operating characteristic) curve we have to use Factory.DrawROCCurve function which is inserted by invoking jsmva magic. The parameters of the function are the following:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawROCCurve(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classifier Cut Efficiencies\n", "\n", "To draw the classifier cut efficiencies we have to use Factory.DrawCutEfficiencies function which is inserted by invoking jsmva magic. The parameters of the function are the following:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
methodNameyes, 2.-- The name of method
" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawCutEfficiencies(dataset, \"MLP\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Draw Neural Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we trained a neural network then the weights of the network will be saved to XML and C file. We can read back the XML file and we can visualize the network using Factory.DrawNeuralNetwork function.\n", "\n", "The arguments of this function:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
methodNameyes, 2.-- The name of method
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This visualization will be interactive, and we can do the following with it:\n", "* Mouseover (node, weight): focusing\n", "* Zooming and grab and move supported\n", "* Reset: double click\n", "\n", "The synapses are drawn with 2 colors, one for positive weight and one for negative weight. The absolute value of the synapses are scaled and transformed to thickness of line between to node." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawNeuralNetwork(dataset, \"MLP\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Draw Deep Neural Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DrawNeuralNetwork function also can visualize deep neural networks, we just have to pass \"DNN\" as method name. If you have very big network with lots of thousands of neurons then drawing the network will be a little bit slow and will need a lot of ram, so be careful with this function.\n", "\n", "This visualization also will be interactive, and we can do the following with it:\n", "\n", "* Zooming and grab and move supported" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawNeuralNetwork(dataset, \"DNN\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Draw Decision Tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The trained decision trees will be save to XML save too, so we can read back the XML file and we can visualize the trees. This is the purpose of Factory.DrawDecisionTree function.\n", "\n", "The arguments of this function:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
methodNameyes, 2.-- The name of method
\n", "\n", "This function will produce a little box where you can enter the index of the tree (the number of trees will be also will appear before this input box) you want to see. After choosing this number you have to press the Draw button. The nodes of tree will be colored, the color is associated to signal efficiency." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualization of tree will be interactive and you can do the following with it:\n", "\n", "* Mouseover (node, weight): showing decision path\n", "* Zooming and grab and move supported\n", "* Reset zoomed tree: double click\n", "* Expand all closed subtrees, turn off zoom: button in the bottom of the picture\n", "* Click on node: \n", "\n", " * hiding subtree, if node children are hidden the node will have a green border\n", " * rescaling: bigger nodes, bigger texts\n", " * click again to show the subtree" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawDecisionTree(dataset, \"BDT\") #11" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DNN weights heat map" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factory.DrawDNNWeights(dataset, \"DNN\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Close the factory's output file" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "outputFile.Close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }