{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SIMD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Single instruction, multiple data (SIMD) is a form of parallel execution  in which the same operation is performed on multiple data elements  independently in hardware vector processing units (VPU), also called SIMD units. The addition of two vectors to form a third vector is a SIMD operation. Many processors have SIMD (vector) units that can perform simultaneously  2, 4, 8 or more executions of the same operation (by a single SIMD unit)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Loops without loop-carried backward dependency (or with dependency preserved using  ordered simd) are candidates for vectorization by the compiler for  execution with SIMD units. In addition, with state-of-the-art vectorization  technology and __declare simd__ directive extensions for function vectorization in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.  The basic idea is that a scalar function call in a loop can be replaced by a vector version  of the function, and the loop can be vectorized simultaneously by combining a loop  vectorization (__simd__ directive on the loop) and a function  vectorization (__declare simd__ directive on the function)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A __simd__ construct states that SIMD operations be performed on the data within the loop.  A number of clauses are available to provide data-sharing attributes (__private__, __linear__, __reduction__ and  __lastprivate__).  Other clauses provide vector length preference/restrictions  (__simdlen__ / __safelen__), loop fusion (__collapse__), and data  alignment (__aligned__)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The __declare simd__ directive designates that a vector version of the function should also be constructed for  execution within loops that contain the function and have a __simd__  directive.  Clauses provide argument specifications (__linear__, __uniform__, and __aligned__), a requested vector length  (__simdlen__), and designate whether the function is always/never  called conditionally in a loop (__notinbranch__/__inbranch__).  The latter is for optimizing performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, the __simd__ construct has been combined with the worksharing loop  constructs (__for simd__ and __do simd__) to enable simultaneous thread  execution in different SIMD units."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Native",
   "language": "native",
   "name": "native"
  },
  "language_info": {
   "file_extension": ".c",
   "mimetype": "text/plain",
   "name": "c"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}