{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ROOT\n", "from ROOT import TFile, TMVA, TCut" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Enable JS visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use new interactive features in notebook we have to enable a module called JsMVA. This can be done by using ipython magic: %jsmva." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%jsmva on" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Declaration of Factory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "outputFile = TFile( \"TMVA.root\", 'RECREATE' )\n", "TMVA.Tools.Instance();\n", "\n", "factory = TMVA.Factory(\"TMVAClassification\", TargetFile=outputFile,\n", " V=False, Color=True, DrawProgressBar=True, Transformations=[\"I\", \"D\", \"P\", \"G\", \"D\"],\n", " AnalysisType=\"Classification\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arguments of constructor:\n", "The options string can contain the following options:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
JobNameyes, 1.not optional-Name of job
TargetFileyes, 2.if not passed histograms won't be saved-File to write control and performance histograms histograms
VnoFalse-Verbose flag
ColornoTrue-Flag for colored output
Transformationsno\"\"-List of transformations to test. For example with \"I;D;P;U;G\" string identity, decorrelation, PCA, uniform and Gaussian transformations will be applied
SilentnoFalse-Batch mode: boolean silent flag inhibiting\n", "any output from TMVA after\n", "the creation of the factory class object
DrawProgressBarnoTrue-Draw progress bar to display training,\n", "testing and evaluation schedule (default:\n", "True)
AnalysisTypenoAutoClassification,\n", "Regression,\n", "Multiclass, AutoSet the analysis type
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Declaring the DataLoader, adding variables and setting up the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we need to declare a DataLoader and add the variables (passing the variable names used in the test and train trees in input dataset). To add variable names to DataLoader we use the AddVariable function. Arguments of this function:\n", "\n", "1. String containing the variable name. Using \":=\" we can add definition too.\n", "\n", "2. String (label to variable, if not present the variable name will be used) or character (defining the type of data points)\n", "\n", "3. If we have label for variable, the data point type still can be passed as third argument " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dataset = \"tmva_class_example\" #the dataset name\n", "loader = TMVA.DataLoader(dataset)\n", "\n", "loader.AddVariable( \"myvar1 := var1+var2\", 'F' )\n", "loader.AddVariable( \"myvar2 := var1-var2\", \"Expression 2\", 'F' )\n", "loader.AddVariable( \"var3\", \"Variable 3\", 'F' )\n", "loader.AddVariable( \"var4\", \"Variable 4\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to define spectator variables, which are part of the input data set, but which are not\n", "used in the MVA training, test nor during the evaluation, but can be used for correlation tests or others. \n", "Parameters:\n", "\n", "1. String containing the definition of spectator variable.\n", "2. Label for spectator variable.\n", "3. Data type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "loader.AddSpectator( \"spec1:=var1*2\", \"Spectator 1\", 'F' )\n", "loader.AddSpectator( \"spec2:=var1*3\", \"Spectator 2\", 'F' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After adding the variables we have to add the datas to DataLoader. In order to do this we check if the dataset file doesn't exist in files directory we download from CERN's server. When we have the root file we open it and get the signal and background trees." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if ROOT.gSystem.AccessPathName( \"tmva_class_example.root\" ) != 0: \n", " ROOT.gSystem.Exec( \"wget https://root.cern.ch/files/tmva_class_example.root\")\n", " \n", "input = TFile.Open( \"tmva_class_example.root\" )\n", "\n", "# Get the signal and background trees for training\n", "signal = input.Get( \"TreeS\" )\n", "background = input.Get( \"TreeB\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To pass the signal and background trees to DataLoader we use the AddSignalTree and AddBackgroundTree functions, and we set up the corresponding DataLoader variable's too.\n", "Arguments of functions:\n", "\n", "1. Signal/Background tree\n", "2. Global weight used in all events in the tree." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Global event weights (see below for setting event-wise weights)\n", "signalWeight = 1.0\n", "backgroundWeight = 1.0\n", "\n", "loader.AddSignalTree(signal, signalWeight)\n", "loader.AddBackgroundTree(background, backgroundWeight)\n", "\n", "loader.fSignalWeight = signalWeight\n", "loader.fBackgroundWeight = backgroundWeight\n", "loader.fTreeS = signal\n", "loader.fTreeB = background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With using DataLoader.PrepareTrainingAndTestTree function we apply cuts on input events. In C++ this function also needs to add the options as a string (as we seen in Factory constructor) which with JsMVA can be passed (same as Factory constructor case) as keyword arguments.\n", "\n", "Arguments of PrepareTrainingAndTestTree:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
SigCutyes, 1.--TCut object for signal cut
Bkgyes, 2.--TCut object for background cut
SplitModenoRandomRandom,\n", "Alternate,\n", "BlockMethod of picking training and testing\n", "events
MixModenoSameAsSplitModeSameAsSplitMode,\n", "Random,\n", "Alternate,\n", "BlockMethod of mixing events of differnt\n", "classes into one dataset
SplitSeedno100-Seed for random event shuffling
NormModenoEqualNumEventsNone, NumEvents,\n", "EqualNumEventsOverall renormalisation of event-by-event\n", "weights used in the training (NumEvents:\n", "average weight of 1 per\n", "event, independently for signal and\n", "background; EqualNumEvents: average\n", "weight of 1 per event for signal,\n", "and sum of weights for background\n", "equal to sum of weights for signal)
nTrain_Signalno0 (all)-Number of training events of class Signal
nTest_Signalno0 (all)-Number of test events of class Signal
nTrain_Backgroundno0 (all)-Number of training events of class\n", "Background
nTest_Background no0 (all)-Number of test events of class Background
VnoFalse-Verbosity
VerboseLevelnoInfoDebug, Verbose,\n", "InfoVerbosity level
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mycuts = TCut(\"\")\n", "mycutb = TCut(\"\")\n", "\n", "loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,\n", " nTrain_Signal=0, nTrain_Background=0, SplitMode=\"Random\", NormMode=\"NumEvents\", V=False)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Booking DNN: 2 ways (don't use both in the same time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is two way to book DNN:\n", "\n", "1) The visual way: run the next cell, and design the network graphically and then click on \"Save Network\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "factory.BookDNN(loader)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2) Classical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trainingStrategy = [{\n", " \"LearningRate\": 1e-1,\n", " \"Momentum\": 0.0,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 20,\n", " \"TestRepetitions\": 15,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"DropConfig\": \"0.0+0.5+0.5+0.5\",\n", " \"DropRepetitions\": 1,\n", " \"Multithreading\": True\n", " \n", " }, {\n", " \"LearningRate\": 1e-2,\n", " \"Momentum\": 0.5,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 30,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"L2\",\n", " \"DropConfig\": \"0.0+0.1+0.1+0.1\",\n", " \"DropRepetitions\": 1,\n", " \"Multithreading\": True\n", " \n", " }, {\n", " \"LearningRate\": 1e-2,\n", " \"Momentum\": 0.3,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 300,\n", " \"BatchSize\": 40,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"L2\",\n", " \"Multithreading\": True\n", " \n", " },{\n", " \"LearningRate\": 1e-3,\n", " \"Momentum\": 0.1,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 200,\n", " \"BatchSize\": 70,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"Multithreading\": True\n", " \n", "}, {\n", " \"LearningRate\": 1e-3,\n", " \"Momentum\": 0.1,\n", " \"Repetitions\": 1,\n", " \"ConvergenceSteps\": 200,\n", " \"BatchSize\": 70,\n", " \"TestRepetitions\": 7,\n", " \"WeightDecay\": 0.001,\n", " \"Regularization\": \"NONE\",\n", " \"Multithreading\": True\n", " \n", "}]\n", "\n", "factory.BookMethod(DataLoader=loader, Method=TMVA.Types.kDNN, MethodTitle=\"DNN\", \n", " H = False, V=False, VarTransform=\"Normalize\", ErrorStrategy=\"CROSSENTROPY\",\n", " Layout=[\"TANH|100\", \"TANH|50\", \"TANH|10\", \"LINEAR\"],\n", " TrainingStrategy=trainingStrategy,Architecture=\"STANDARD\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Train Methods\n", "\n", "When you use the jsmva magic, the original C++ version of Factory::TrainAllMethods is rewritten by a new training method, which will produce notebook compatible output during the training, so we can trace the process (progress bar, error plot). For some methods (MLP, DNN, BDT) there will be created a tracer plot (for MLP, DNN test and training error vs epoch, for BDT error fraction and boost weight vs tree number). There are also some method which doesn't support interactive tracing, so for these methods just a simple text will be printed, just to we know that TrainAllMethods function is training this method currently.\n", "\n", "For methods where is possible to trace the training interactively there is a stop button, which can stop the training process. This button just stops the training of the current method, and doesn't stop the TrainAllMethods completely. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "factory.TrainAllMethods()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Test end evaluate the methods\n", "\n", "To test test the methods and evaluate the performance we need to run Factory.TestAllMethods and Factory.EvaluateAllMethods functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "factory.TestAllMethods()\n", "factory.EvaluateAllMethods()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Draw Deep Neural Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we trained a neural network then the weights of the network will be saved to XML and C file. We can read back the XML file and we can visualize the network using Factory.DrawNeuralNetwork function.\n", "\n", "The arguments of this function:\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "
KeywordCan be used as positional argumentDefaultPredefined valuesDescription
datasetNameyes, 1.-- The name of dataset
methodNameyes, 2.-- The name of method
\n", "\n", " If you have very big network with lots of thousands of neurons then drawing the network will be a little bit slow and will need a lot of ram, so be careful with this function.\n", "\n", "This visualization also will be interactive, and we can do the following with it:\n", "\n", "* Zooming and grab and move supported" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "factory.DrawNeuralNetwork(dataset, \"DNN\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DNN weights heat map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "factory.DrawDNNWeights(dataset, \"DNN\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Close the factory's output file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "outputFile.Close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }