{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine learning technique for signal-background separation of nuclear interaction vertices in the CMS detector \n",
"\n",
"Phil Baringer: baringer@ku.edu, Anna Kropivnitskaya (speaker): kropiv@cern.ch \n",
"
\n",
"University of Kansas\n",
"
\n",
"_for CMS Collaboration_\n",
"\n",
"_Code with Toy data is available at_ [Binder](https://mybinder.org/v2/gh/kropiv/MLforNIatPyHEP/master)\n",
"\n",
"## Abstract:\n",
"The CMS inner tracking system is a fully silicon-based high precision detector. Accurate knowledge of the positions of active and inactive elements is important for simulating the detector, planning detector upgrades, and reconstructing charged particle tracks. Nuclear interactions of hadrons with the detector material create secondary vertices whose positions map the material with a sub-millimeter precision in situ, while the detector is collecting data from LHC collisions. \n",
"\n",
"A neural network (NN) with two hidden layers was used to separate secondary vertices due to combinatorial background from those arising from nuclear interactions with material. The NN was trained and tested on data from proton-proton collisions at a center-of-mass energy of 13 TeV, recorded in 2018 at the LHC. \n",
"\n",
"NN training is performed using Keras and Matplotlib in a Jupyter notebook. Secondary vertices in the training data are classified as signal or background, based on their geometrical position. Even though the variables used in training show only small differences between background and signal, the NN has impressive separation power. Hadrographies of the CMS inner tracker detector before and after background cleaning are presented."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of contents\n",
"\n",
"* [Introduction](#Intro)\n",
"* [CMS detectors](#CMSdet)\n",
" * [Pixel detector](#PixelDet)\n",
"* [Nuclear interactions](#NIs) \n",
" * [Data selection and reconstruction](#Data)\n",
" * [Toy Data](#ToyData)\n",
" * [Connect/Generate data](#Import-data)\n",
"* [Neural network (NN) motivation and strategy](#NNmotivation)\n",
"* [Classification strategy for NN](#Classification)\n",
" * [Set Signal and Background regions for beam pipe](#SetBP)\n",
" * [Set Signal and Background regions for BPIX](#SetBPIX)\n",
" * [Set Signal and Background regions for pixel support tube](#SetTube)\n",
" * [Set Signal and Background regions for rails, by using x position of NI candidate](#SetRails)\n",
" * [Classify NI candidates as Signal, as Background, and as Non-classified regions](#ClassifyEvents)\n",
" * [Check classification result](#CheckClassification)\n",
" * [Estimate background-signal ratio (B/S) of each signal region](#BkgEstimation) \n",
" * [Shuffle Data](#DataShufle)\n",
" * [Sort track parameters by $p_T$ decreasing and normalize subleading $p_T$ tracks](#SortPt)\n",
" * [Plot variables, injected to NN](#VariablesToNN)\n",
" * [Divide data into Train and Test data sets](#DataSplit)\n",
" * [Data preparation and classification for the NN](#FinalClassification)\n",
"* [Principal component analysis (PCA)](#PCA)\n",
"* [Keras mode: NN with 2 hidden layers](#KerasModel)\n",
" * [Import libraries](#ImportKeras)\n",
" * [Create function for NN model with 2 hidden layers](#NNfunction)\n",
" * [Create NN model structure and compile it](#ModelCompile)\n",
" * [NN model training](#ModelTraining)\n",
" * [Save/Load NN model to/from file](#Save-Load-Model)\n",
" * [Monitor performance during training](#MonitorTraining)\n",
" * [Model results](#ModelResults)\n",
" * [Predict the probability distribution of NN classes for Train and Test sets](#PredictY)\n",
" * [The probability distribution for injected vertex to be a signal](#PlotY)\n",
" * [NN model optimization with Test set](#YpredOptimization)\n",
" * [Plot Train and Test prediction for Signal-Background separation as function of BPIX radius](#PlotPredictedResultsR)\n",
" * [Background to Signal (B/S) ratios in Signal regions (S0-S6)](#BSafterClass)\n",
" * [Tracker tomography with Test set for Signal-Background separation in x-y plane](#PlotPredictedTomography)\n",
"* [Summary](#Summary)\n",
"* [Documentation](#Doc)\n",
"* [Acknowledgment](#Acknowledgment)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction \n",
"[jump to top](#Top)\n",
"\n",
"In HEP, very often problem of signal-background (noise) separation appears:\n",
"* resonant peaks from decay particle,\n",
"* physics objects reconstruction/identification, for example, pions, photons, electrons, b(t)-jets… \n",
"* nuclear interactions with material (this analysis),\n",
"* photon conversions with material,\n",
"which are on top of combinatorial background.\n",
"\n",
"Very often, there is no clean sample of signal to train a neural network (NN), but regions with enhanced contribution of signal could be known (by mass, geometrical, or any phase space cuts).\n",
"\n",
"Machine learning technique of NN with floating classification could be helpful:\n",
"* input classification is done on mixed samples with different fractions of background to signal,\n",
"* output classification is optimized for real signal and combinatorial background separation.\n",
"\n",
"This analysis is performed with [Jupyter notebook](http://ebooks.iospress.nl/publication/42900) \n",
"at [SWAN](https://www.sciencedirect.com/science/article/abs/pii/S0167739X16307105?via%3Dihub) platform:\n",
"https://github.com/kropiv/MLforNIatPyHEP and [Binder](https://mybinder.org/v2/gh/kropiv/MLforNIatPyHEP/master)\n",
"and is based on the CMS public results, [public twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/TrackerMaterialNIwithML2018).\n",
"\n",
"# CMS detectors \n",
"[jump to top](#Top)\n",
"\n",
"
\n",
"
\n",
" $\\bf{Fig. 1}$: CMS detectors.\n",
"
\n",
"
\n",
" $\\bf{Fig. 2}$: Pixel detector. \n",
"
\n",
"
\n",
" $\\bf{Fig. 3}$: Nuclear interaction schematic view. \n",
"
\n",
"
\n",
" $\\bf{Fig. 4}$: Hadrography of the tracking system in the x-y plane in the barrel region ($|z| < 25$ cm). The density of NI vertices reproduces structure of the BPIX detector. \n",
"
B/S ratio | BP | IS | L1 | L2 | L3 | L4 | OS | Tube | Rails | Background | |||||||||||
no classification | 0.16 | 2.58 | 1.2 | 0.92 | 0.71 | 0.17 | 0.41 | 0.06 | 0.13 | 1000.0 |
%s | \"%(field)\n", " html += \" |
Background to Signal ratio | BP | IS | L1 | L2 | L3 | L4 | OS | Tube | Rails | ||||||||||
no classification | 0.16 | 2.58 | 1.2 | 0.92 | 0.71 | 0.17 | 0.41 | 0.06 | 0.13 |
signal class with Train set_ | 0.07 | 1.1 | 0.51 | 0.3 | 0.25 | 0.1 | 0.27 | 0.04 | 0.09 |
signal class with Test set__ | 0.07 | 1.13 | 0.51 | 0.28 | 0.27 | 0.1 | 0.3 | 0.05 | 0.09 |