# How to use PyHEADTAIL on GPU

This notebook will show you how to use the GPU functionality of PyHEADTAIL. Created on 19. Feb 2016, Stefan Hegglin

### Installation notes

In order to use the GPU module, you will need the following:
 - A Nvidia GPU: Tested on Tesla C2075 and Kepler K20
 - CUDA version >6.5
 - PyCUDA version 2015.1.3. Earlier versions possible but are not tested.
 - scikit-cuda 0.5.1
 

### Simulation Setup

The usual imports: numpy, matplotlib

In [1]:
from __future__ import division, print_function

import numpy as np

from scipy.constants import c, e, m_p

In order to use the GPU, initialise it via the following statement. If it fails, this means the GPU or pycuda was not setup correctly. [This could also be performed automatically inside the context module, however it is less safe since we do not know what happens if the user creates another context]

Note: it is *important* to initialise the GPU before PyHEADTAIL is imported for the first time.

In [2]:
import pycuda.autoinit

Add PyHEADTAIL to the path:

In [3]:
# sets the PyHEADTAIL directory etc.
try:
 from settings import *
except:
 pass

Import the GPU and CPU contextmanagers and the PyHEADTAIL `Synchrotron`:

In [4]:
from PyHEADTAIL.machines.synchrotron import Synchrotron
from PyHEADTAIL.general.contextmanager import GPU
from PyHEADTAIL.general.contextmanager import CPU

PyHEADTAIL v1.12.4.7




Define machine parameters and create a machine object and a corresponding matched bunch:

In [5]:
# machine parameters
circumference = 26658.8832
n_segments = 10
charge = e
mass = m_p
beta_x = 92.7 
D_x = 0
beta_y = 93.2 
D_y = 0 

Q_x = 64.28
Q_y = 59.31

Qp_x = 10.
Qp_y = 15.

app_x = 0.0000e-9
app_y = 0.0000e-9
app_xy = 0

alpha = 3.225e-04

h1, h2 = 35640, 35640*2
V1, V2 = 6e6, 0.
dphi1, dphi2 = 0, np.pi

longitudinal_mode = 'non-linear'
p0 = 450e9 * e / c
p_increment = 0

machine = Synchrotron(
 optics_mode='smooth', circumference=circumference, 
 n_segments=n_segments, 
 beta_x=beta_x, D_x=D_x, beta_y=beta_y, D_y=D_y,
 accQ_x=Q_x, accQ_y=Q_y, Qp_x=Qp_x, Qp_y=Qp_y, 
 app_x=app_x, app_y=app_y, app_xy=app_xy,
 alpha_mom_compaction=alpha, longitudinal_mode=longitudinal_mode,
 h_RF=[h1,h2], V_RF=[V1,V2], dphi_RF=[dphi1,dphi2], 
 p0=p0, p_increment=p_increment, charge=charge, mass=mass,
 use_cython=False
)

# bunch parameters
macroparticlenumber = 100000
intensity = 1e11
epsn_x = 2.5e-6
epsn_y = 3.5e-6
sigma_z = 0.05
bunch = machine.generate_6D_Gaussian_bunch_matched(
 macroparticlenumber, intensity, epsn_x, epsn_y, sigma_z=sigma_z
)

# simulation parameters
n_turns = 10

*** Maximum RMS bunch length 0.117895151015m.




... distance to target bunch length: -5.0000e-02
... distance to target bunch length: 6.4638e-02
... distance to target bunch length: 4.8815e-02
... distance to target bunch length: 5.6104e-03
... distance to target bunch length: -1.3673e-03
... distance to target bunch length: -2.1248e-05
... distance to target bunch length: 8.4927e-09
... distance to target bunch length: -2.6939e-07
--> Bunch length: 0.0500000084927
--> Emittance: 0.163402703633


### Main tracking loop
Up to this point everything has been performed on the CPU and the script for CPUs and GPUs is the same (except the use_cython=False parameter and the `import pycuda.autoinit` statement) when setting up the simulation. Next, we'll create a GPU context to enclose the main tracking loop.

In [6]:
with GPU(bunch) as context:
 for n in range(n_turns):
 machine.track(bunch)

Ok, this seems to work. How do we know it's actually running on the GPU? We can check the type of the bunch phase-space arrays inside of the with statement:

In [7]:
print ('The type of bunch.x before entering the with-statement is', type(bunch.x))
with GPU(bunch) as context:
 machine.track(bunch)
 print ('The type of bunch.x inside of the with-statement is', type(bunch.x))
print ('The type of bunch.x after the with-region is', type(bunch.x))

The type of bunch.x before entering the with-statement is 
The type of bunch.x inside of the with-statement is 
The type of bunch.x after the with-region is 


You can also use the CPU contextmanager to have more similar code for GPU and CPU scripts:

In [8]:
print ('The type of bunch.x before entering the with-statement is ', type(bunch.x))
with CPU(bunch) as context:
 machine.track(bunch)
 print ('The type of bunch.x inside of the with-statement is ', type(bunch.x))
print ('The type of bunch.x after the with-region is ', type(bunch.x))

The type of bunch.x before entering the with-statement is 
The type of bunch.x inside of the with-statement is 
The type of bunch.x after the with-region is 


That's it! If you need access to the bunch-phase space arrays during the simulation, you can move a copy back to the CPU by using bunch.x.get(). Printing GPUArrays works out of the box if you need it for debugging:

In [9]:
with GPU(bunch) as context:
 print ('The type of bunch.x inside of the with-statement is ', type(bunch.x))
 print ('\nThe first three entries of bunch.x are ', bunch.x[0:3], ' (note: the array sits in GPU memory!)\n')
 cpu_bunch_x = bunch.x.get()
 print ('A CPU copy of bunch.x inside the with-statement has type ', type(cpu_bunch_x))

The type of bunch.x inside of the with-statement is 

The first three entries of bunch.x are [ 0.00019505 -0.00044431 -0.00050942] (note: the array sits in GPU memory!)

A CPU copy of bunch.x inside the with-statement has type 
