MMX
===
Shift
-----
XMM
~~~
_mm_sll_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m64 _mm_sll_pi16(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        	FI
        ENDFOR
        	

_mm_slli_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_slli_pi16(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sll_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m64 _mm_sll_pi32(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        	FI
        ENDFOR
        	

_mm_slli_pi32
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_slli_pi32(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sll_si64
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _mm_sll_si64(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF count[63:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] << count[63:0])
        FI
        	

_mm_slli_si64
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_slli_si64(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF imm8[7:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0])
        FI
        	

_mm_sra_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m64 _mm_sra_pi16(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srai_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_srai_pi16(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sra_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m64 _mm_sra_pi32(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srai_pi32
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_srai_pi32(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m64 _mm_srl_pi16(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srli_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_srli_pi16(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m64 _mm_srl_pi32(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srli_pi32
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_srli_pi32(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_si64
^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _mm_srl_si64(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF count[63:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0])
        FI
        	

_mm_srli_si64
^^^^^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_srli_si64(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF imm8[7:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0])
        FI
        	

MMX
~~~
_m_psllw
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psllw(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        	FI
        ENDFOR
        	

_m_psllwi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psllwi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_m_pslld
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_pslld(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        	FI
        ENDFOR
        	

_m_pslldi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_pslldi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_m_psllq
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psllq(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF count[63:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] << count[63:0])
        FI
        	

_m_psllqi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psllqi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF imm8[7:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0])
        FI
        	

_m_psraw
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psraw(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_m_psrawi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psrawi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_m_psrad
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    SI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psrad(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_m_psradi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psradi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_m_psrlw
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psrlw(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_m_psrlwi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psrlwi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_m_psrld
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psrld(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_m_psrldi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psrldi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_m_psrlq
^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m64 _m_psrlq(__m64 a, __m64 count);

.. admonition:: Intel Description

    Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF count[63:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0])
        FI
        	

_m_psrlqi
^^^^^^^^^
:Tech: MMX
:Category: Shift
:Header: mmintrin.h
:Searchable: MMX-Shift-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_psrlqi(__m64 a, int imm8);

.. admonition:: Intel Description

    Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF imm8[7:0] > 63
        	dst[63:0] := 0
        ELSE
        	dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0])
        FI
        	

General Support
---------------
XMM
~~~
_mm_empty
^^^^^^^^^
:Tech: MMX
:Category: General Support
:Header: mmintrin.h
:Searchable: MMX-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_empty(void );

.. admonition:: Intel Description

    Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

MMX
~~~
_m_empty
^^^^^^^^
:Tech: MMX
:Category: General Support
:Header: mmintrin.h
:Searchable: MMX-General Support-MMX
:Register: MMX 64 bit
:Return Type: void

.. code-block:: C

    void _m_empty(void );

.. admonition:: Intel Description

    Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

Logical
-------
XMM
~~~
_mm_and_si64
^^^^^^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_and_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] AND b[63:0])
        	

_mm_andnot_si64
^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_andnot_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := ((NOT a[63:0]) AND b[63:0])
        	

_mm_or_si64
^^^^^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_or_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] OR b[63:0])
        	

_mm_xor_si64
^^^^^^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_xor_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] XOR b[63:0])
        	

MMX
~~~
_m_pand
^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pand(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] AND b[63:0])
        	

_m_pandn
^^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pandn(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := ((NOT a[63:0]) AND b[63:0])
        	

_m_por
^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_por(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] OR b[63:0])
        	

_m_pxor
^^^^^^^
:Tech: MMX
:Category: Logical
:Header: mmintrin.h
:Searchable: MMX-Logical-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pxor(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] XOR b[63:0])
        	

Swizzle
-------
XMM
~~~
_mm_unpackhi_pi8
^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_unpackhi_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) {
        	dst[7:0] := src1[39:32]
        	dst[15:8] := src2[39:32] 
        	dst[23:16] := src1[47:40]
        	dst[31:24] := src2[47:40]
        	dst[39:32] := src1[55:48]
        	dst[47:40] := src2[55:48]
        	dst[55:48] := src1[63:56]
        	dst[63:56] := src2[63:56]
        	RETURN dst[63:0]	
        }
        dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
        	

_mm_unpackhi_pi16
^^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_unpackhi_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) {
        	dst[15:0] := src1[47:32]
        	dst[31:16] := src2[47:32]
        	dst[47:32] := src1[63:48]
        	dst[63:48] := src2[63:48]
        	RETURN dst[63:0]
        }
        dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
        	

_mm_unpackhi_pi32
^^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_unpackhi_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32]
        dst[63:32] := b[63:32]
        	

_mm_unpacklo_pi8
^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_unpacklo_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	RETURN dst[63:0]	
        }
        dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
        	

_mm_unpacklo_pi16
^^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_unpacklo_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	RETURN dst[63:0]	
        }
        dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
        	

_mm_unpacklo_pi32
^^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_unpacklo_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := b[31:0]
        	

MMX
~~~
_m_punpckhbw
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpckhbw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) {
        	dst[7:0] := src1[39:32]
        	dst[15:8] := src2[39:32] 
        	dst[23:16] := src1[47:40]
        	dst[31:24] := src2[47:40]
        	dst[39:32] := src1[55:48]
        	dst[47:40] := src2[55:48]
        	dst[55:48] := src1[63:56]
        	dst[63:56] := src2[63:56]
        	RETURN dst[63:0]
        }
        dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
        	

_m_punpckhwd
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpckhwd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) {
        	dst[15:0] := src1[47:32]
        	dst[31:16] := src2[47:32]
        	dst[47:32] := src1[63:48]
        	dst[63:48] := src2[63:48]
        	RETURN dst[63:0]
        }
        dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
        	

_m_punpckhdq
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpckhdq(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32]
        dst[63:32] := b[63:32]
        	

_m_punpcklbw
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpcklbw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	RETURN dst[63:0]	
        }
        dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
        	

_m_punpcklwd
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpcklwd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	RETURN dst[63:0]	
        }
        dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
        	

_m_punpckldq
^^^^^^^^^^^^
:Tech: MMX
:Category: Swizzle
:Header: mmintrin.h
:Searchable: MMX-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_punpckldq(__m64 a, __m64 b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := b[31:0]
        	

Arithmetic
----------
XMM
~~~
_mm_add_pi8
^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_add_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := a[i+7:i] + b[i+7:i]
        ENDFOR
        	

_mm_add_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_add_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := a[i+15:i] + b[i+15:i]
        ENDFOR
        	

_mm_add_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_add_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        	

_mm_adds_pi8
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m64 _mm_adds_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_mm_adds_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_adds_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_mm_adds_pu8
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_adds_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_mm_adds_pu16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_adds_pu16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_mm_sub_pi8
^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_sub_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := a[i+7:i] - b[i+7:i]
        ENDFOR
        	

_mm_sub_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_sub_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := a[i+15:i] - b[i+15:i]
        ENDFOR
        	

_mm_sub_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_sub_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        	

_mm_subs_pi8
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m64 _mm_subs_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_mm_subs_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_subs_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        ENDFOR
        	

_mm_subs_pu8
^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_subs_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_mm_subs_pu16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_subs_pu16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])	
        ENDFOR
        	

_mm_madd_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_madd_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        ENDFOR
        	

_mm_mulhi_pi16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_mulhi_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

_mm_mullo_pi16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_mullo_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[15:0]
        ENDFOR
        	

MMX
~~~
_m_paddb
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_paddb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := a[i+7:i] + b[i+7:i]
        ENDFOR
        	

_m_paddw
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_paddw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := a[i+15:i] + b[i+15:i]
        ENDFOR
        	

_m_paddd
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_paddd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        	

_m_paddsb
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_paddsb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_m_paddsw
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_paddsw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_m_paddusb
^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_paddusb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_m_paddusw
^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_paddusw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_m_psubb
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_psubb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := a[i+7:i] - b[i+7:i]
        ENDFOR
        	

_m_psubw
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_psubw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := a[i+15:i] - b[i+15:i]
        ENDFOR
        	

_m_psubd
^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_psubd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        	

_m_psubsb
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_psubsb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_m_psubsw
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_psubsw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        ENDFOR
        	

_m_psubusb
^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_psubusb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_m_psubusw
^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_psubusw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])	
        ENDFOR
        	

_m_pmaddwd
^^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_pmaddwd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        ENDFOR
        	

_m_pmulhw
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_pmulhw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

_m_pmullw
^^^^^^^^^
:Tech: MMX
:Category: Arithmetic
:Header: mmintrin.h
:Searchable: MMX-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pmullw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[15:0]
        ENDFOR
        	

Compare
-------
XMM
~~~
_mm_cmpeq_pi8
^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_cmpeq_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_mm_cmpeq_pi16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_cmpeq_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_mm_cmpeq_pi32
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_cmpeq_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpgt_pi8
^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m64 _mm_cmpgt_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_mm_cmpgt_pi16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_cmpgt_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_mm_cmpgt_pi32
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _mm_cmpgt_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

MMX
~~~
_m_pcmpeqb
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pcmpeqb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_m_pcmpeqw
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pcmpeqw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_m_pcmpeqd
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _m_pcmpeqd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_m_pcmpgtb
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_pcmpgtb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_m_pcmpgtw
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_pcmpgtw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_m_pcmpgtd
^^^^^^^^^^
:Tech: MMX
:Category: Compare
:Header: mmintrin.h
:Searchable: MMX-Compare-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m64 _m_pcmpgtd(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

Set
---
XMM
~~~
_mm_setzero_si64
^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64

.. code-block:: C

    __m64 _mm_setzero_si64(void );

.. admonition:: Intel Description

    Return vector of type __m64 with all elements set to zero.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm_set_pi32
^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    int e1, 
    int e0
:Param ETypes:
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m64 _mm_set_pi32(int e1, int e0);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        	

_mm_set_pi16
^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m64 _mm_set_pi16(short e3, short e2, short e1, short e0);

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e0
        dst[31:16] := e1
        dst[47:32] := e2
        dst[63:48] := e3
        	

_mm_set_pi8
^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m64 _mm_set_pi8(char e7, char e6, char e5, char e4,
                      char e3, char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e0
        dst[15:8] := e1
        dst[23:16] := e2
        dst[31:24] := e3
        dst[39:32] := e4
        dst[47:40] := e5
        dst[55:48] := e6
        dst[63:56] := e7
        	

_mm_set1_pi32
^^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m64 _mm_set1_pi32(int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        	

_mm_set1_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    short a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m64 _mm_set1_pi16(short a);

.. admonition:: Intel Description

    Broadcast 16-bit integer "a" to all all elements of "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        	

_mm_set1_pi8
^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    char a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m64 _mm_set1_pi8(char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        	

_mm_setr_pi32
^^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    int e1, 
    int e0
:Param ETypes:
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m64 _mm_setr_pi32(int e1, int e0);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values in reverse order.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e1
        dst[63:32] := e0
        	

_mm_setr_pi16
^^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m64 _mm_setr_pi16(short e3, short e2, short e1, short e0);

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values in reverse order.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e3
        dst[31:16] := e2
        dst[47:32] := e1
        dst[63:48] := e0
        	

_mm_setr_pi8
^^^^^^^^^^^^
:Tech: MMX
:Category: Set
:Header: mmintrin.h
:Searchable: MMX-Set-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m64 _mm_setr_pi8(char e7, char e6, char e5, char e4,
                       char e3, char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values in reverse order.

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e7
        dst[15:8] := e6
        dst[23:16] := e5
        dst[31:24] := e4
        dst[39:32] := e3
        dst[47:40] := e2
        dst[55:48] := e1
        dst[63:56] := e0
        	

Convert
-------
XMM
~~~
_mm_cvtsi32_si64
^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m64 _mm_cvtsi32_si64(int a);

.. admonition:: Intel Description

    Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := 0
        	

_mm_cvtsi64_si32
^^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m64 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvtsi64_si32(__m64 a);

.. admonition:: Intel Description

    Copy the lower 32-bit integer in "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm_cvtm64_si64
^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m64 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _mm_cvtm64_si64(__m64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm_cvtsi64_m64
^^^^^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m64 _mm_cvtsi64_m64(__int64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

MMX
~~~
_m_from_int64
^^^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m64 _m_from_int64(__int64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_m_to_int64
^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-MMX
:Register: MMX 64 bit
:Return Type: __int64
:Param Types:
    __m64 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _m_to_int64(__m64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_m_from_int
^^^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m64 _m_from_int(int a);

.. admonition:: Intel Description

    Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := 0
        	

_m_to_int
^^^^^^^^^
:Tech: MMX
:Category: Convert
:Header: mmintrin.h
:Searchable: MMX-Convert-MMX
:Register: MMX 64 bit
:Return Type: int
:Param Types:
    __m64 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _m_to_int(__m64 a);

.. admonition:: Intel Description

    Copy the lower 32-bit integer in "a" to "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

Miscellaneous
-------------
XMM
~~~
_mm_packs_pi16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_packs_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := Saturate8(a[15:0])
        dst[15:8] := Saturate8(a[31:16])
        dst[23:16] := Saturate8(a[47:32])
        dst[31:24] := Saturate8(a[63:48])
        dst[39:32] := Saturate8(b[15:0])
        dst[47:40] := Saturate8(b[31:16])
        dst[55:48] := Saturate8(b[47:32])
        dst[63:56] := Saturate8(b[63:48])
        	

_mm_packs_pi32
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _mm_packs_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:0])
        dst[31:16] := Saturate16(a[63:32])
        dst[47:32] := Saturate16(b[31:0])
        dst[63:48] := Saturate16(b[63:32])
        	

_mm_packs_pu16
^^^^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_packs_pu16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := SaturateU8(a[15:0])
        dst[15:8] := SaturateU8(a[31:16])
        dst[23:16] := SaturateU8(a[47:32])
        dst[31:24] := SaturateU8(a[63:48])
        dst[39:32] := SaturateU8(b[15:0])
        dst[47:40] := SaturateU8(b[31:16])
        dst[55:48] := SaturateU8(b[47:32])
        dst[63:56] := SaturateU8(b[63:48])
        	

MMX
~~~
_m_packsswb
^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _m_packsswb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := Saturate8(a[15:0])
        dst[15:8] := Saturate8(a[31:16])
        dst[23:16] := Saturate8(a[47:32])
        dst[31:24] := Saturate8(a[63:48])
        dst[39:32] := Saturate8(b[15:0])
        dst[47:40] := Saturate8(b[31:16])
        dst[55:48] := Saturate8(b[47:32])
        dst[63:56] := Saturate8(b[63:48])
        	

_m_packssdw
^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _m_packssdw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:0])
        dst[31:16] := Saturate16(a[63:32])
        dst[47:32] := Saturate16(b[31:0])
        dst[63:48] := Saturate16(b[63:32])
        	

_m_packuswb
^^^^^^^^^^^
:Tech: MMX
:Category: Miscellaneous
:Header: mmintrin.h
:Searchable: MMX-Miscellaneous-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _m_packuswb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".

.. deprecated:: X87

    MMX technology intrinsics can cause issues on modern processors and should generally be avoided. Use SSE2, AVX, or later instruction sets instead, especially when targeting modern processors.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := SaturateU8(a[15:0])
        dst[15:8] := SaturateU8(a[31:16])
        dst[23:16] := SaturateU8(a[47:32])
        dst[31:24] := SaturateU8(a[63:48])
        dst[39:32] := SaturateU8(b[15:0])
        dst[47:40] := SaturateU8(b[31:16])
        dst[55:48] := SaturateU8(b[47:32])
        dst[63:56] := SaturateU8(b[63:48])
        	

SSE_ALL
=======
Shift
-----
XMM
~~~
_mm_slli_si128
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_slli_si128(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] << (tmp*8)
        	

_mm_bslli_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_bslli_si128(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] << (tmp*8)
        	

_mm_bsrli_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_bsrli_si128(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] >> (tmp*8)
        	

_mm_slli_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_slli_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sll_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_sll_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        	FI
        ENDFOR
        	

_mm_slli_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_slli_epi32(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sll_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_sll_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        	FI
        ENDFOR
        	

_mm_slli_epi64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_slli_epi64(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sll_epi64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_sll_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        	FI
        ENDFOR
        	

_mm_srai_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srai_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sra_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_sra_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srai_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srai_epi32(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_sra_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_sra_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srli_si128
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srli_si128(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] >> (tmp*8)
        	

_mm_srli_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srli_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_srl_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srli_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srli_epi32(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_srl_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        	

_mm_srli_epi64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srli_epi64(__m128i a, int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        	

_mm_srl_epi64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Shift
:Header: emmintrin.h
:Searchable: SSE_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_srl_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        	

Cryptography
------------
XMM
~~~
_mm_crc32_u8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: SSE_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned char v
:Param ETypes:
    UI32 crc, 
    UI8 v

.. code-block:: C

    unsigned int _mm_crc32_u8(unsigned int crc, unsigned char v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[7:0] := v[0:7] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[39:0] := tmp1[7:0] << 32 
        tmp4[39:0] := tmp2[31:0] << 8
        tmp5[39:0] := tmp3[39:0] XOR tmp4[39:0]
        tmp6[31:0] := MOD2(tmp5[39:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: SSE_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned short v
:Param ETypes:
    UI32 crc, 
    UI16 v

.. code-block:: C

    unsigned int _mm_crc32_u16(unsigned int crc, unsigned short v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[15:0] := v[0:15] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[47:0] := tmp1[15:0] << 32
        tmp4[47:0] := tmp2[31:0] << 16
        tmp5[47:0] := tmp3[47:0] XOR tmp4[47:0]
        tmp6[31:0] := MOD2(tmp5[47:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: SSE_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned int v
:Param ETypes:
    UI32 crc, 
    UI32 v

.. code-block:: C

    unsigned int _mm_crc32_u32(unsigned int crc, unsigned int v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[31:0] := v[0:31] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[63:0] := tmp1[31:0] << 32
        tmp4[63:0] := tmp2[31:0] << 32
        tmp5[63:0] := tmp3[63:0] XOR tmp4[63:0]
        tmp6[31:0] := MOD2(tmp5[63:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: SSE_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 crc, 
    unsigned __int64 v
:Param ETypes:
    UI64 crc, 
    UI64 v

.. code-block:: C

    unsigned __int64 _mm_crc32_u64(unsigned __int64 crc, unsigned __int64 v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[63:0] := v[0:63] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[95:0] := tmp1[31:0] << 32
        tmp4[95:0] := tmp2[63:0] << 64
        tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0]
        tmp6[31:0] := MOD2(tmp5[95:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

Move
----
XMM
~~~
_mm_move_ss
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: xmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_move_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := b[31:0]
        dst[127:32] := a[127:32]
        	

_mm_movehl_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: xmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_movehl_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := b[95:64]
        dst[63:32] := b[127:96]
        dst[95:64] := a[95:64]
        dst[127:96] := a[127:96]
        	

_mm_movelh_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: xmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_movelh_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := a[63:32]
        dst[95:64] := b[31:0]
        dst[127:96] := b[63:32]
        	

_mm_movpi64_epi64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: emmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_movpi64_epi64(__m64 a);

.. admonition:: Intel Description

    Copy the 64-bit integer "a" to the lower element of "dst", and zero the upper element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := 0
        	

_mm_move_epi64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: emmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_move_epi64(__m128i a);

.. admonition:: Intel Description

    Copy the lower 64-bit integer in "a" to the lower element of "dst", and zero the upper element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := 0
        	

_mm_move_sd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: emmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_move_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := b[63:0]
        dst[127:64] := a[127:64]
        	

_mm_movedup_pd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: pmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_movedup_pd(__m128d a);

.. admonition:: Intel Description

    Duplicate the low double-precision (64-bit) floating-point element from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := a[63:0]
        	

_mm_movehdup_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: pmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_movehdup_ps(__m128 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] 
        dst[63:32] := a[63:32]
        dst[95:64] := a[127:96] 
        dst[127:96] := a[127:96]
        	

_mm_moveldup_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Move
:Header: pmmintrin.h
:Searchable: SSE_ALL-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_moveldup_ps(__m128 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] 
        dst[63:32] := a[31:0]
        dst[95:64] := a[95:64] 
        dst[127:96] := a[95:64]
        	

Cast
----
XMM
~~~
_mm_castpd_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128 _mm_castpd_ps(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castpd_si128
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_castpd_si128(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castps_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128d _mm_castps_pd(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castps_si128
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_castps_si128(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castsi128_pd
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128d _mm_castsi128_pd(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castsi128_ps
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Cast
:Header: emmintrin.h
:Searchable: SSE_ALL-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128 _mm_castsi128_ps(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

String Compare
--------------
XMM
~~~
_mm_cmpistrm
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_cmpistrm(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated mask in "dst".
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF a[m+size-1:m] == 0
        			aInvalid := 1
        		FI
        		IF b[n+size-1:n] == 0
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        bInvalid := 0
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF b[n+size-1:n] == 0
        				bInvalid := 1
        			FI
        			IF bInvalid // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        IF imm8[6] // byte / word mask
        	FOR i := 0 to UpperBound
        		j := i*size
        		IF IntRes2[i]
        			dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF)
        		ELSE
        			dst[j+size-1:j] := 0
        		FI
        	ENDFOR
        ELSE // bit mask
        	dst[UpperBound:0] := IntRes2[UpperBound:0]
        	dst[127:UpperBound+1] := 0
        FI
        	

_mm_cmpistri
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistri(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated index in "dst".
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF a[m+size-1:m] == 0
        			aInvalid := 1
        		FI
        		IF b[n+size-1:n] == 0
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        bInvalid := 0
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF b[n+size-1:n] == 0
        				bInvalid := 1
        			FI
        			IF bInvalid // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        IF imm8[6] // most significant bit
        	tmp := UpperBound
        	dst := tmp
        	DO WHILE ((tmp >= 0) AND a[tmp] == 0)
        		tmp := tmp - 1
        		dst := tmp
        	OD
        ELSE // least significant bit
        	tmp := 0
        	dst := tmp
        	DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0)
        		tmp := tmp + 1
        		dst := tmp
        	OD
        FI
        	

_mm_cmpistrz
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistrz(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        bInvalid := 0
        FOR j := 0 to UpperBound
        	n := j*size
        	IF b[n+size-1:n] == 0
        		bInvalid := 1
        	FI
        ENDFOR
        dst := bInvalid
        	

_mm_cmpistrc
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistrc(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF a[m+size-1:m] == 0
        			aInvalid := 1
        		FI
        		IF b[n+size-1:n] == 0
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        bInvalid := 0
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF b[n+size-1:n] == 0
        				bInvalid := 1
        			FI
        			IF bInvalid // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := (IntRes2 != 0)
        	

_mm_cmpistrs
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistrs(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        aInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	IF a[m+size-1:m] == 0
        		aInvalid := 1
        	FI
        ENDFOR
        dst := aInvalid
        	

_mm_cmpistro
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistro(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns bit 0 of the resulting bit mask.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF a[m+size-1:m] == 0
        			aInvalid := 1
        		FI
        		IF b[n+size-1:n] == 0
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        bInvalid := 0
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF b[n+size-1:n] == 0
        				bInvalid := 1
        			FI
        			IF bInvalid // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := IntRes2[0]
        	

_mm_cmpistra
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    int _mm_cmpistra(__m128i a, __m128i b, const int imm8);

.. admonition:: Intel Description

    Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF a[m+size-1:m] == 0
        			aInvalid := 1
        		FI
        		IF b[n+size-1:n] == 0
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        bInvalid := 0
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF b[n+size-1:n] == 0
        				bInvalid := 1
        			FI
        			IF bInvalid // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := (IntRes2 == 0) AND bInvalid
        	

_mm_cmpestrm
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    __m128i _mm_cmpestrm(__m128i a, int la, __m128i b, int lb,
                         const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated mask in "dst".
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF i == la
        			aInvalid := 1
        		FI
        		IF j == lb
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF i >= lb // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        IF imm8[6] // byte / word mask
        	FOR i := 0 to UpperBound
        		j := i*size
        		IF IntRes2[i]
        			dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF)
        		ELSE
        			dst[j+size-1:j] := 0
        		FI
        	ENDFOR
        ELSE // bit mask
        	dst[UpperBound:0] := IntRes2[UpperBound:0]
        	dst[127:UpperBound+1] := 0
        FI
        	

_mm_cmpestri
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestri(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated index in "dst".
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF i == la
        			aInvalid := 1
        		FI
        		IF j == lb
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF i >= lb // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        IF imm8[6] // most significant bit
        	tmp := UpperBound
        	dst := tmp
        	DO WHILE ((tmp >= 0) AND a[tmp] == 0)
        		tmp := tmp - 1
        		dst := tmp
        	OD
        ELSE // least significant bit
        	tmp := 0
        	dst := tmp
        	DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0)
        		tmp := tmp + 1
        		dst := tmp
        	OD
        FI
        	

_mm_cmpestrz
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestrz(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        dst := (lb <= UpperBound)
        	

_mm_cmpestrc
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestrc(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF i == la
        			aInvalid := 1
        		FI
        		IF j == lb
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF i >= lb // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := (IntRes2 != 0)
        	

_mm_cmpestrs
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestrs(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        dst := (la <= UpperBound)
        	

_mm_cmpestro
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestro(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns bit 0 of the resulting bit mask.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF i == la
        			aInvalid := 1
        		FI
        		IF j == lb
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF i >= lb // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := IntRes2[0]
        	

_mm_cmpestra
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: String Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-String Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int la, 
    __m128i b, 
    int lb, 
    const int imm8
:Param ETypes:
    M128 a, 
    UI32 la, 
    M128 b, 
    UI32 lb, 
    IMM imm8

.. code-block:: C

    int _mm_cmpestra(__m128i a, int la, __m128i b, int lb,
                     const int imm8)

.. admonition:: Intel Description

    Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise.
    	[strcmp_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
        UpperBound := (128 / size) - 1
        BoolRes := 0
        // compare all characters
        aInvalid := 0
        bInvalid := 0
        FOR i := 0 to UpperBound
        	m := i*size
        	FOR j := 0 to UpperBound
        		n := j*size
        		BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
        		
        		// invalidate characters after EOS
        		IF i == la
        			aInvalid := 1
        		FI
        		IF j == lb
        			bInvalid := 1
        		FI
        		
        		// override comparisons for invalid characters
        		CASE (imm8[3:2]) OF
        		0:  // equal any
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		1:  // ranges
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			FI
        		2:  // equal each
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		3:  // equal ordered
        			IF (!aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 0
        			ELSE IF (aInvalid && !bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			ELSE IF (aInvalid && bInvalid)
        				BoolRes.word[i].bit[j] := 1
        			FI
        		ESAC
        	ENDFOR
        ENDFOR
        // aggregate results
        CASE (imm8[3:2]) OF
        0:  // equal any
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
        		ENDFOR
        	ENDFOR
        1:  // ranges
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		FOR j := 0 to UpperBound
        			IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
        			j += 2
        		ENDFOR
        	ENDFOR
        2:  // equal each
        	IntRes1 := 0
        	FOR i := 0 to UpperBound
        		IntRes1[i] := BoolRes.word[i].bit[i]
        	ENDFOR
        3:  // equal ordered
        	IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
        	FOR i := 0 to UpperBound
        		k := i
        		FOR j := 0 to UpperBound-i
        			IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
        			k := k+1
        		ENDFOR
        	ENDFOR
        ESAC
        // optionally negate results
        FOR i := 0 to UpperBound
        	IF imm8[4]
        		IF imm8[5] // only negate valid
        			IF i >= lb // invalid, don't negate
        				IntRes2[i] := IntRes1[i]
        			ELSE // valid, negate
        				IntRes2[i] := -1 XOR IntRes1[i]
        			FI
        		ELSE // negate all
        			IntRes2[i] := -1 XOR IntRes1[i]
        		FI
        	ELSE // don't negate
        		IntRes2[i] := IntRes1[i]
        	FI
        ENDFOR
        // output
        dst := (IntRes2 == 0) AND (lb > UpperBound)
        	

General Support
---------------
XMM
~~~
_mm_getcsr
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: unsigned int

.. code-block:: C

    unsigned int _mm_getcsr(void );

.. admonition:: Intel Description

    Get the unsigned 32-bit value of the MXCSR control and status register.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := MXCSR
        	

_mm_setcsr
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _mm_setcsr(unsigned int a);

.. admonition:: Intel Description

    Set the MXCSR control and status register with the value in unsigned 32-bit integer "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MXCSR := a[31:0]
        	

_mm_prefetch
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    char const* p, 
    int i
:Param ETypes:
    UI8 p, 
    IMM i

.. code-block:: C

    void _mm_prefetch(char const* p, int i);

.. admonition:: Intel Description

    Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i", which can be one of:<ul>
        <li>_MM_HINT_T0   // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.</li>
        <li>_MM_HINT_T1   // 2, move data using the T1 hint. The PREFETCHT1 instruction will be generated.</li>
        <li>_MM_HINT_T2   // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.</li>
        <li>_MM_HINT_NTA  // 0, move data using the non-temporal access (NTA) hint. The PREFETCHNTA instruction will be generated.</li>
    

_mm_sfence
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_sfence(void );

.. admonition:: Intel Description

    Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. Guarantees that every store instruction that precedes, in program order, is globally visible before any store instruction which follows the fence in program order.

_mm_malloc
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void*
:Param Types:
    size_t size, 
    size_t align
:Param ETypes:
    UI64 size, 
    UI64 align

.. code-block:: C

    void* _mm_malloc(size_t size, size_t align);

.. admonition:: Intel Description

    Allocate "size" bytes of memory, aligned to the alignment specified in "align", and return a pointer to the allocated memory. "_mm_free" should be used to free memory that is allocated with "_mm_malloc".

_mm_free
^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_free(void * mem_addr);

.. admonition:: Intel Description

    Free aligned memory that was allocated with "_mm_malloc".

_mm_undefined_ps
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: __m128

.. code-block:: C

    __m128 _mm_undefined_ps(void );

.. admonition:: Intel Description

    Return vector of type __m128 with undefined elements.

_mm_undefined_pd
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: __m128d

.. code-block:: C

    __m128d _mm_undefined_pd(void );

.. admonition:: Intel Description

    Return vector of type __m128d with undefined elements.

_mm_undefined_si128
^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: __m128i

.. code-block:: C

    __m128i _mm_undefined_si128(void );

.. admonition:: Intel Description

    Return vector of type __m128i with undefined elements.

_mm_pause
^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_pause(void );

.. admonition:: Intel Description

    Provide a hint to the processor that the code sequence is a spin-wait loop. This can help improve the performance and power consumption of spin-wait loops.

_mm_clflush
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_clflush(void const* p);

.. admonition:: Intel Description

    Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.

_mm_lfence
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_lfence(void );

.. admonition:: Intel Description

    Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction. Guarantees that every load instruction that precedes, in program order, is globally visible before any load instruction which follows the fence in program order.

_mm_mfence
^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: emmintrin.h
:Searchable: SSE_ALL-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_mfence(void );

.. admonition:: Intel Description

    Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction which follows the fence in program order.

Other
~~~~~
_MM_GET_EXCEPTION_STATE
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Macro: Get the exception state bits from the MXCSR control and status register. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := MXCSR & _MM_EXCEPT_MASK
        	

_MM_SET_EXCEPTION_STATE
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _MM_SET_EXCEPTION_STATE(unsigned int a);

.. admonition:: Intel Description

    Macro: Set the exception state bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MXCSR := a[31:0] AND ~_MM_EXCEPT_MASK
        	

_MM_GET_EXCEPTION_MASK
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Macro: Get the exception mask bits from the MXCSR control and status register. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := MXCSR & _MM_MASK_MASK
        	

_MM_SET_EXCEPTION_MASK
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _MM_SET_EXCEPTION_MASK(unsigned int a);

.. admonition:: Intel Description

    Macro: Set the exception mask bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MXCSR := a[31:0] AND ~_MM_MASK_MASK
        	

_MM_GET_ROUNDING_MODE
^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Macro: Get the rounding mode bits from the MXCSR control and status register. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := MXCSR & _MM_ROUND_MASK
        	

_MM_SET_ROUNDING_MODE
^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _MM_SET_ROUNDING_MODE(unsigned int a);

.. admonition:: Intel Description

    Macro: Set the rounding mode bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MXCSR := a[31:0] AND ~_MM_ROUND_MASK
        	

_MM_GET_FLUSH_ZERO_MODE
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Macro: Get the flush zero bits from the MXCSR control and status register. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := MXCSR & _MM_FLUSH_MASK
        	

_MM_SET_FLUSH_ZERO_MODE
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: SSE_ALL-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _MM_SET_FLUSH_ZERO_MODE(unsigned int a);

.. admonition:: Intel Description

    Macro: Set the flush zero bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MXCSR := a[31:0] AND ~_MM_FLUSH_MASK
        	

Probability/Statistics
----------------------
XMM
~~~
_mm_avg_pu8
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: xmmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_avg_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        ENDFOR
        	

_mm_avg_pu16
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: xmmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_avg_pu16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        ENDFOR
        	

_mm_avg_epu8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: emmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_avg_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        ENDFOR
        	

_mm_avg_epu16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: emmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_avg_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        ENDFOR
        	

MMX
~~~
_m_pavgb
^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: xmmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _m_pavgb(__m64 a, __m64 b);

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        ENDFOR
        	

_m_pavgw
^^^^^^^^
:Tech: SSE_ALL
:Category: Probability/Statistics
:Header: xmmintrin.h
:Searchable: SSE_ALL-Probability/Statistics-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _m_pavgw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        ENDFOR
        	

Special Math Functions
----------------------
XMM
~~~
_mm_max_pi16
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_max_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_max_pu8
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_max_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_min_pi16
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_min_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_min_pu8
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_min_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_min_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_min_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MIN(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_min_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_min_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_max_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_max_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MAX(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_max_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_max_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_max_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_max_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_max_epu8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_max_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_min_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_min_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_min_epu8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_min_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_max_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_max_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MAX(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        	

_mm_max_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_max_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        	

_mm_min_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_min_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MIN(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        	

_mm_min_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_min_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        	

_mm_max_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_max_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_max_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_max_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_max_epu32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_max_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_max_epu16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_max_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_min_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_min_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_mm_min_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_min_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_min_epu32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_min_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        	

_mm_min_epu16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_min_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_mm_round_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m128d _mm_round_pd(__m128d a, int rounding);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ROUND(a[i+63:i], rounding)
        ENDFOR
        	

_mm_floor_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_floor_pd(__m128d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := FLOOR(a[i+63:i])
        ENDFOR
        	

_mm_ceil_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_ceil_pd(__m128d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CEIL(a[i+63:i])
        ENDFOR
        	

_mm_round_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m128 _mm_round_ps(__m128 a, int rounding);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ROUND(a[i+31:i], rounding)
        ENDFOR
        	

_mm_floor_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_floor_ps(__m128 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := FLOOR(a[i+31:i])
        ENDFOR
        	

_mm_ceil_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_ceil_ps(__m128 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := CEIL(a[i+31:i])
        ENDFOR
        	

_mm_round_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_round_sd(__m128d a, __m128d b, int rounding);

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" using the "rounding" parameter, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := ROUND(b[63:0], rounding)
        dst[127:64] := a[127:64]
        	

_mm_floor_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_floor_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" down to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := FLOOR(b[63:0])
        dst[127:64] := a[127:64]
        	

_mm_ceil_sd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_ceil_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" up to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := CEIL(b[63:0])
        dst[127:64] := a[127:64]
        	

_mm_round_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" using the "rounding" parameter, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ROUND(b[31:0], rounding)
        dst[127:32] := a[127:32]
        	

_mm_floor_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_floor_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" down to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := FLOOR(b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_ceil_ss
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: smmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_ceil_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" up to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := CEIL(b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_abs_pi8
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m64 _mm_abs_pi8(__m64 a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := ABS(Int(a[i+7:i]))
        ENDFOR
        	

_mm_abs_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m128i _mm_abs_epi8(__m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := ABS(a[i+7:i])
        ENDFOR
        	

_mm_abs_pi16
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m64 _mm_abs_pi16(__m64 a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := ABS(Int(a[i+15:i]))
        ENDFOR
        	

_mm_abs_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128i _mm_abs_epi16(__m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ABS(a[i+15:i])
        ENDFOR
        	

_mm_abs_pi32
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m64 _mm_abs_pi32(__m64 a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	dst[i+31:i] := ABS(a[i+31:i])
        ENDFOR
        	

_mm_abs_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: tmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm_abs_epi32(__m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ABS(a[i+31:i])
        ENDFOR
        	

MMX
~~~
_m_pmaxsw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _m_pmaxsw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_m_pmaxub
^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _m_pmaxub(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        	

_m_pminsw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _m_pminsw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        	

_m_pminub
^^^^^^^^^
:Tech: SSE_ALL
:Category: Special Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Special Math Functions-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _m_pminub(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        	

Logical
-------
XMM
~~~
_mm_and_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: xmmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_and_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        ENDFOR
        	

_mm_andnot_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: xmmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_andnot_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        ENDFOR
        	

_mm_or_ps
^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: xmmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_or_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        	

_mm_xor_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: xmmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_xor_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        	

_mm_and_si128
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    __m128i _mm_and_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := (a[127:0] AND b[127:0])
        	

_mm_andnot_si128
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    __m128i _mm_andnot_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 128 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := ((NOT a[127:0]) AND b[127:0])
        	

_mm_or_si128
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    __m128i _mm_or_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise OR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := (a[127:0] OR b[127:0])
        	

_mm_xor_si128
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    __m128i _mm_xor_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := (a[127:0] XOR b[127:0])
        	

_mm_and_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_and_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        ENDFOR
        	

_mm_andnot_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_andnot_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        ENDFOR
        	

_mm_or_pd
^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_or_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        	

_mm_xor_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: emmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_xor_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        	

_mm_testz_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    int _mm_testz_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[127:0] AND b[127:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[127:0]) AND b[127:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        RETURN ZF
        	

_mm_testc_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    int _mm_testc_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[127:0] AND b[127:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[127:0]) AND b[127:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        RETURN CF
        	

_mm_testnzc_si128
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    M128 a, 
    M128 b

.. code-block:: C

    int _mm_testnzc_si128(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[127:0] AND b[127:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[127:0]) AND b[127:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm_test_all_zeros
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i mask, 
    __m128i a
:Param ETypes:
    M128 mask, 
    M128 a

.. code-block:: C

    int _mm_test_all_zeros(__m128i mask, __m128i a);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and return 1 if the result is zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[127:0] AND mask[127:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        dst := ZF
        	

_mm_test_mix_ones_zeros
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i mask, 
    __m128i a
:Param ETypes:
    M128 mask, 
    M128 a

.. code-block:: C

    int _mm_test_mix_ones_zeros(__m128i mask, __m128i a);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "mask", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[127:0] AND mask[127:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[127:0]) AND mask[127:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm_test_all_ones
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Logical
:Header: smmintrin.h
:Searchable: SSE_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a
:Param ETypes:
    M128 a

.. code-block:: C

    int _mm_test_all_ones(__m128i a);

.. admonition:: Intel Description

    Compute the bitwise NOT of "a" and then AND with a 128-bit vector containing all 1's, and return 1 if the result is zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 127
        	tmp[j] := 1
        ENDFOR
        IF (((NOT a[127:0]) AND tmp[127:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := CF
        	

Swizzle
-------
XMM
~~~
_mm_extract_pi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    int _mm_extract_pi16(__m64 a, int imm8);

.. admonition:: Intel Description

    Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0]
        dst[31:16] := 0
        	

_mm_insert_pi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int i, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 i, 
    IMM imm8

.. code-block:: C

    __m64 _mm_insert_pi16(__m64 a, int i, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        sel := imm8[1:0]*16
        dst[sel+15:sel] := i[15:0]
        	

_mm_shuffle_pi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m64 _mm_shuffle_pi16(__m64 a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[15:0] := src[15:0]
        	1:	tmp[15:0] := src[31:16]
        	2:	tmp[15:0] := src[47:32]
        	3:	tmp[15:0] := src[63:48]
        	ESAC
        	RETURN tmp[15:0]
        }
        dst[15:0] := SELECT4(a[63:0], imm8[1:0])
        dst[31:16] := SELECT4(a[63:0], imm8[3:2])
        dst[47:32] := SELECT4(a[63:0], imm8[5:4])
        dst[63:48] := SELECT4(a[63:0], imm8[7:6])
        	

_mm_shuffle_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    unsigned int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_shuffle_ps(__m128 a, __m128 b,
                          unsigned int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        	

_mm_unpackhi_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_unpackhi_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        	

_mm_unpacklo_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_unpacklo_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        	

_mm_extract_epi16
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    int _mm_extract_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a[127:0] >> (imm8[2:0] * 16))[15:0]
        dst[31:16] := 0
        	

_mm_insert_epi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int i, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 i, 
    IMM imm8

.. code-block:: C

    __m128i _mm_insert_epi16(__m128i a, int i, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        sel := imm8[2:0]*16
        dst[sel+15:sel] := i[15:0]
        	

_mm_shuffle_epi32
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shuffle_epi32(__m128i a, int imm8);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        	

_mm_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shufflehi_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        	

_mm_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shufflelo_epi16(__m128i a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        dst[127:64] := a[127:64]
        	

_mm_unpackhi_epi8
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_unpackhi_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        	

_mm_unpackhi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_unpackhi_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        	

_mm_unpackhi_epi32
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_unpackhi_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        	

_mm_unpackhi_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_unpackhi_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        	

_mm_unpacklo_epi8
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_unpacklo_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        	

_mm_unpacklo_epi16
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_unpacklo_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        	

_mm_unpacklo_epi32
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_unpacklo_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        	

_mm_unpacklo_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_unpacklo_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        	

_mm_unpackhi_pd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_unpackhi_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        	

_mm_unpacklo_pd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_unpacklo_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        	

_mm_shuffle_pd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: emmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_shuffle_pd(__m128d a, __m128d b, int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        	

_mm_blend_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_blend_pd(__m128d a, __m128d b, const int imm8);

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF imm8[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_blend_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_blend_ps(__m128 a, __m128 b, const int imm8);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_blendv_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d mask
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 mask

.. code-block:: C

    __m128d _mm_blendv_pd(__m128d a, __m128d b, __m128d mask);

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_blendv_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 mask
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 mask

.. code-block:: C

    __m128 _mm_blendv_ps(__m128 a, __m128 b, __m128 mask);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_blendv_epi8
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i mask
:Param ETypes:
    UI8 a, 
    UI8 b, 
    UI8 mask

.. code-block:: C

    __m128i _mm_blendv_epi8(__m128i a, __m128i b, __m128i mask);

.. admonition:: Intel Description

    Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF mask[i+7]
        		dst[i+7:i] := b[i+7:i]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_blend_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_blend_epi16(__m128i a, __m128i b,
                            const int imm8)

.. admonition:: Intel Description

    Blend packed 16-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF imm8[j]
        		dst[i+15:i] := b[i+15:i]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm_extract_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    const int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    int _mm_extract_ps(__m128 a, const int imm8);

.. admonition:: Intel Description

    Extract a single-precision (32-bit) floating-point element from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
        	

_mm_extract_epi8
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI8 a, 
    IMM imm8

.. code-block:: C

    int _mm_extract_epi8(__m128i a, const int imm8);

.. admonition:: Intel Description

    Extract an 8-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := (a[127:0] >> (imm8[3:0] * 8))[7:0]
        dst[31:8] := 0
        	

_mm_extract_epi32
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    int _mm_extract_epi32(__m128i a, const int imm8);

.. admonition:: Intel Description

    Extract a 32-bit integer from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
        	

_mm_extract_epi64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __int64 _mm_extract_epi64(__m128i a, const int imm8);

.. admonition:: Intel Description

    Extract a 64-bit integer from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[127:0] >> (imm8[0] * 64))[63:0]
        	

_mm_insert_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_insert_ps(__m128 a, __m128 b, const int imm8);

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert a single-precision (32-bit) floating-point element from "b" into "tmp" using the control in "imm8". Store "tmp" to "dst" using the mask in "imm8" (elements are zeroed out when the corresponding bit is set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp2[127:0] := a[127:0]
        CASE (imm8[7:6]) OF
        0: tmp1[31:0] := b[31:0]
        1: tmp1[31:0] := b[63:32]
        2: tmp1[31:0] := b[95:64]
        3: tmp1[31:0] := b[127:96]
        ESAC
        CASE (imm8[5:4]) OF
        0: tmp2[31:0] := tmp1[31:0]
        1: tmp2[63:32] := tmp1[31:0]
        2: tmp2[95:64] := tmp1[31:0]
        3: tmp2[127:96] := tmp1[31:0]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[j%8]
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := tmp2[i+31:i]
        	FI
        ENDFOR
        	

_mm_insert_epi8
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int i, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 i, 
    IMM imm8

.. code-block:: C

    __m128i _mm_insert_epi8(__m128i a, int i, const int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the lower 8-bit integer from "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        sel := imm8[3:0]*8
        dst[sel+7:sel] := i[7:0]
        	

_mm_insert_epi32
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int i, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 i, 
    IMM imm8

.. code-block:: C

    __m128i _mm_insert_epi32(__m128i a, int i, const int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        sel := imm8[1:0]*32
        dst[sel+31:sel] := i[31:0]
        	

_mm_insert_epi64
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: smmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __int64 i, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 i, 
    IMM imm8

.. code-block:: C

    __m128i _mm_insert_epi64(__m128i a, __int64 i,
                             const int imm8)

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        sel := imm8[0]*64
        dst[sel+63:sel] := i[63:0]
        	

_mm_shuffle_epi8
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: tmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_shuffle_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF b[i+7] == 1
        		dst[i+7:i] := 0
        	ELSE
        		index[3:0] := b[i+3:i]
        		dst[i+7:i] := a[index*8+7:index*8]
        	FI
        ENDFOR
        	

_mm_shuffle_pi8
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: tmmintrin.h
:Searchable: SSE_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_shuffle_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	IF b[i+7] == 1
        		dst[i+7:i] := 0
        	ELSE
        		index[2:0] := b[i+2:i]
        		dst[i+7:i] := a[index*8+7:index*8]
        	FI
        ENDFOR
        	

Other
~~~~~
_MM_TRANSPOSE4_PS
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-Other
:Return Type: void
:Param Types:
    __m128 row0, 
    __m128 row1, 
    __m128 row2, 
    __m128 row3
:Param ETypes:
    FP32 row0, 
    FP32 row1, 
    FP32 row2, 
    FP32 row3

.. code-block:: C

    void _MM_TRANSPOSE4_PS(__m128 row0, __m128 row1,
                           __m128 row2, __m128 row3)

.. admonition:: Intel Description

    Macro: Transpose the 4x4 matrix formed by the 4 rows of single-precision (32-bit) floating-point elements in "row0", "row1", "row2", and "row3", and store the transposed matrix in these vectors ("row0" now contains column 0, etc.).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        __m128 tmp3, tmp2, tmp1, tmp0;
        tmp0 := _mm_unpacklo_ps(row0, row1);
        tmp2 := _mm_unpacklo_ps(row2, row3);
        tmp1 := _mm_unpackhi_ps(row0, row1);
        tmp3 := _mm_unpackhi_ps(row2, row3);
        row0 := _mm_movelh_ps(tmp0, tmp2);
        row1 := _mm_movehl_ps(tmp2, tmp0);
        row2 := _mm_movelh_ps(tmp1, tmp3);
        row3 := _mm_movehl_ps(tmp3, tmp1);
        	

MMX
~~~
_m_pextrw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: int
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    int _m_pextrw(__m64 a, int imm8);

.. admonition:: Intel Description

    Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0]
        dst[31:16] := 0
        	

_m_pinsrw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int i, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 i, 
    IMM imm8

.. code-block:: C

    __m64 _m_pinsrw(__m64 a, int i, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        sel := imm8[1:0]*16
        dst[sel+15:sel] := i[15:0]
        	

_m_pshufw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Swizzle
:Header: xmmintrin.h
:Searchable: SSE_ALL-Swizzle-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m64 _m_pshufw(__m64 a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[15:0] := src[15:0]
        	1:	tmp[15:0] := src[31:16]
        	2:	tmp[15:0] := src[47:32]
        	3:	tmp[15:0] := src[63:48]
        	ESAC
        	RETURN tmp[15:0]
        }
        dst[15:0] := SELECT4(a[63:0], imm8[1:0])
        dst[31:16] := SELECT4(a[63:0], imm8[3:2])
        dst[47:32] := SELECT4(a[63:0], imm8[5:4])
        dst[63:48] := SELECT4(a[63:0], imm8[7:6])
        	

Store
-----
XMM
~~~
_mm_stream_pi
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m64 a
:Param ETypes:
    FP32 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_stream_pi(void* mem_addr, __m64 a);

.. admonition:: Intel Description

    Store 64-bits of integer data from "a" into memory using a non-temporal memory hint.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_mm_maskmove_si64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m64 a, 
    __m64 mask, 
    char* mem_addr
:Param ETypes:
    UI8 a, 
    UI8 mask, 
    UI8 mem_addr

.. code-block:: C

    void _mm_maskmove_si64(__m64 a, __m64 mask, char* mem_addr);

.. admonition:: Intel Description

    Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	IF mask[i+7]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_stream_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_stream_ps(void* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeh_pi
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m64* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_storeh_pi(__m64* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store the upper 2 single-precision (32-bit) floating-point elements from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[95:64]
        MEM[mem_addr+63:mem_addr+32] := a[127:96]
        	

_mm_storel_pi
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m64* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_storel_pi(__m64* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store the lower 2 single-precision (32-bit) floating-point elements from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        MEM[mem_addr+63:mem_addr+32] := a[63:32]
        	

_mm_store_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_store_ss(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store the lower single-precision (32-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        	

_mm_store1_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_store1_ps(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        MEM[mem_addr+63:mem_addr+32] := a[31:0]
        MEM[mem_addr+95:mem_addr+64] := a[31:0]
        MEM[mem_addr+127:mem_addr+96] := a[31:0]
        	

_mm_store_ps1
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_store_ps1(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        MEM[mem_addr+63:mem_addr+32] := a[31:0]
        MEM[mem_addr+95:mem_addr+64] := a[31:0]
        MEM[mem_addr+127:mem_addr+96] := a[31:0]
        	

_mm_store_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_store_ps(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_storeu_ps(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storer_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm_storer_ps(float* mem_addr, __m128 a);

.. admonition:: Intel Description

    Store 4 single-precision (32-bit) floating-point elements from "a" into memory in reverse order.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[127:96]
        MEM[mem_addr+63:mem_addr+32] := a[95:64]
        MEM[mem_addr+95:mem_addr+64] := a[63:32]
        MEM[mem_addr+127:mem_addr+96] := a[31:0]
        	

_mm_storeu_si16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI16 mem_addr, 
    UI16 a

.. code-block:: C

    void _mm_storeu_si16(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 16-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+15:mem_addr] := a[15:0]
        	

_mm_storeu_si64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_storeu_si64(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 64-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_mm_storeu_si32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm_storeu_si32(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 32-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        	

_mm_maskmoveu_si128
^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i a, 
    __m128i mask, 
    char* mem_addr
:Param ETypes:
    UI8 a, 
    UI8 mask, 
    UI8 mem_addr

.. code-block:: C

    void _mm_maskmoveu_si128(__m128i a, __m128i mask,
                             char* mem_addr)

.. admonition:: Intel Description

    Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF mask[i+7]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_store_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i* mem_addr, 
    __m128i a
:Param ETypes:
    M128 mem_addr, 
    M128 a

.. code-block:: C

    void _mm_store_si128(__m128i* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits of integer data from "a" into memory. 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_si128
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i* mem_addr, 
    __m128i a
:Param ETypes:
    M128 mem_addr, 
    M128 a

.. code-block:: C

    void _mm_storeu_si128(__m128i* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits of integer data from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storel_epi64
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i* mem_addr, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_storel_epi64(__m128i* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 64-bit integer from the first element of "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_mm_stream_si128
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    M128 mem_addr, 
    M128 a

.. code-block:: C

    void _mm_stream_si128(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits of integer data from "a" into memory using a non-temporal memory hint. 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_stream_si32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    int a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm_stream_si32(void* mem_addr, int a);

.. admonition:: Intel Description

    Store 32-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        	

_mm_stream_si64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __int64 a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_stream_si64(void* mem_addr, __int64 a);

.. admonition:: Intel Description

    Store 64-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_mm_stream_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_stream_pd(void* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_store_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_store_sd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store the lower double-precision (64-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_mm_store1_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_store1_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        MEM[mem_addr+127:mem_addr+64] := a[63:0]
        	

_mm_store_pd1
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_store_pd1(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        MEM[mem_addr+127:mem_addr+64] := a[63:0]
        	

_mm_store_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_store_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_storeu_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storer_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_storer_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store 2 double-precision (64-bit) floating-point elements from "a" into memory in reverse order.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[127:64]
        MEM[mem_addr+127:mem_addr+64] := a[63:0]
        	

_mm_storeh_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_storeh_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store the upper double-precision (64-bit) floating-point element from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[127:64]
        	

_mm_storel_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: emmintrin.h
:Searchable: SSE_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm_storel_pd(double* mem_addr, __m128d a);

.. admonition:: Intel Description

    Store the lower double-precision (64-bit) floating-point element from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

MMX
~~~
_m_maskmovq
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Store
:Header: immintrin.h
:Searchable: SSE_ALL-Store-MMX
:Register: MMX 64 bit
:Return Type: void
:Param Types:
    __m64 a, 
    __m64 mask, 
    char* mem_addr
:Param ETypes:
    UI8 a, 
    UI8 mask, 
    UI8 mem_addr

.. code-block:: C

    void _m_maskmovq(__m64 a, __m64 mask, char* mem_addr);

.. admonition:: Intel Description

    Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	IF mask[i+7]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

Load
----
XMM
~~~
_mm_loadh_pi
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m64 const* mem_addr
:Param ETypes:
    FP32 a, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_loadh_pi(__m128 a, __m64 const* mem_addr);

.. admonition:: Intel Description

    Load 2 single-precision (32-bit) floating-point elements from memory into the upper 2 elements of "dst", and copy the lower 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[63:32] := a[63:32]
        dst[95:64] := MEM[mem_addr+31:mem_addr]
        dst[127:96] := MEM[mem_addr+63:mem_addr+32]
        	

_mm_loadl_pi
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m64 const* mem_addr
:Param ETypes:
    FP32 a, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_loadl_pi(__m128 a, __m64 const* mem_addr);

.. admonition:: Intel Description

    Load 2 single-precision (32-bit) floating-point elements from memory into the lower 2 elements of "dst", and copy the upper 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+31:mem_addr]
        dst[63:32] := MEM[mem_addr+63:mem_addr+32]
        dst[95:64] := a[95:64]
        dst[127:96] := a[127:96]
        	

_mm_load_ss
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_load_ss(float const* mem_addr);

.. admonition:: Intel Description

    Load a single-precision (32-bit) floating-point element from memory into the lower of "dst", and zero the upper 3 elements. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+31:mem_addr]
        dst[127:32] := 0
        	

_mm_load1_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_load1_ps(float const* mem_addr);

.. admonition:: Intel Description

    Load a single-precision (32-bit) floating-point element from memory into all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+31:mem_addr]
        dst[63:32] := MEM[mem_addr+31:mem_addr]
        dst[95:64] := MEM[mem_addr+31:mem_addr]
        dst[127:96] := MEM[mem_addr+31:mem_addr]
        	

_mm_load_ps1
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_load_ps1(float const* mem_addr);

.. admonition:: Intel Description

    Load a single-precision (32-bit) floating-point element from memory into all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+31:mem_addr]
        dst[63:32] := MEM[mem_addr+31:mem_addr]
        dst[95:64] := MEM[mem_addr+31:mem_addr]
        dst[127:96] := MEM[mem_addr+31:mem_addr]
        	

_mm_load_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_load_ps(float const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_loadu_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_loadu_ps(float const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_loadr_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_loadr_ps(float const* mem_addr);

.. admonition:: Intel Description

    Load 4 single-precision (32-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+127:mem_addr+96]
        dst[63:32] := MEM[mem_addr+95:mem_addr+64]
        dst[95:64] := MEM[mem_addr+63:mem_addr+32]
        dst[127:96] := MEM[mem_addr+31:mem_addr]
        	

_mm_loadu_si64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_loadu_si64(void const* mem_addr);

.. admonition:: Intel Description

    Load unaligned 64-bit integer from memory into the first element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[MAX:64] := 0
        	

_mm_loadu_si16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: immintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_loadu_si16(void const* mem_addr);

.. admonition:: Intel Description

    Load unaligned 16-bit integer from memory into the first element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := MEM[mem_addr+15:mem_addr]
        dst[MAX:16] := 0
        	

_mm_loadu_si32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_loadu_si32(void const* mem_addr);

.. admonition:: Intel Description

    Load unaligned 32-bit integer from memory into the first element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MEM[mem_addr+31:mem_addr]
        dst[MAX:32] := 0
        	

_mm_loadl_epi64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_loadl_epi64(__m128i const* mem_addr);

.. admonition:: Intel Description

    Load 64-bit integer from memory into the first element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[MAX:64] := 0
        	

_mm_load_si128
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i const* mem_addr
:Param ETypes:
    M128 mem_addr

.. code-block:: C

    __m128i _mm_load_si128(__m128i const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits of integer data from memory into "dst". 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_loadu_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i const* mem_addr
:Param ETypes:
    M128 mem_addr

.. code-block:: C

    __m128i _mm_loadu_si128(__m128i const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits of integer data from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_load_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_load_pd(double const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_load1_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_load1_pd(double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[127:64] := MEM[mem_addr+63:mem_addr]
        	

_mm_load_pd1
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_load_pd1(double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[127:64] := MEM[mem_addr+63:mem_addr]
        	

_mm_loadr_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_loadr_pd(double const* mem_addr);

.. admonition:: Intel Description

    Load 2 double-precision (64-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+127:mem_addr+64]
        dst[127:64] := MEM[mem_addr+63:mem_addr]
        	

_mm_loadu_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_loadu_pd(double const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_load_sd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_load_sd(double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into the lower of "dst", and zero the upper element. "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[127:64] := 0
        	

_mm_loadh_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    double const* mem_addr
:Param ETypes:
    FP64 a, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_loadh_pd(__m128d a, double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into the upper element of "dst", and copy the lower element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := MEM[mem_addr+63:mem_addr]
        	

_mm_loadl_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: emmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    double const* mem_addr
:Param ETypes:
    FP64 a, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_loadl_pd(__m128d a, double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst", and copy the upper element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[127:64] := a[127:64]
        	

_mm_lddqu_si128
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: pmmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i const* mem_addr
:Param ETypes:
    M128 mem_addr

.. code-block:: C

    __m128i _mm_lddqu_si128(__m128i const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm_loadu_si128" when the data crosses a cache line boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

_mm_loaddup_pd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: pmmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_loaddup_pd(double const* mem_addr);

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MEM[mem_addr+63:mem_addr]
        dst[127:64] := MEM[mem_addr+63:mem_addr]
        	

_mm_stream_load_si128
^^^^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Load
:Header: smmintrin.h
:Searchable: SSE_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void* mem_addr
:Param ETypes:
    M128 mem_addr

.. code-block:: C

    __m128i _mm_stream_load_si128(void* mem_addr);

.. admonition:: Intel Description

    Load 128-bits of integer data from memory into "dst" using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        	

Elementary Math Functions
-------------------------
XMM
~~~
_mm_sqrt_ss
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_sqrt_ss(__m128 a);

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := SQRT(a[31:0])
        dst[127:32] := a[127:32]
        	

_mm_sqrt_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_sqrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        	

_mm_rcp_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rcp_ss(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (1.0 / a[31:0])
        dst[127:32] := a[127:32]
        	

_mm_rcp_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rcp_ps(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (1.0 / a[i+31:i])
        ENDFOR
        	

_mm_rsqrt_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rsqrt_ss(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (1.0 / SQRT(a[31:0]))
        dst[127:32] := a[127:32]
        	

_mm_rsqrt_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: xmmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rsqrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        ENDFOR
        	

_mm_sqrt_sd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_sqrt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := SQRT(b[63:0])
        dst[127:64] := a[127:64]
        	

_mm_sqrt_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Elementary Math Functions
:Header: emmintrin.h
:Searchable: SSE_ALL-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_sqrt_pd(__m128d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        	

Arithmetic
----------
XMM
~~~
_mm_mulhi_pu16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _mm_mulhi_pu16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

_mm_add_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_add_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] + b[31:0]
        dst[127:32] := a[127:32]
        	

_mm_add_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_add_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        	

_mm_sub_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_sub_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - b[31:0]
        dst[127:32] := a[127:32]
        	

_mm_sub_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_sub_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        	

_mm_mul_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mul_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] * b[31:0]
        dst[127:32] := a[127:32]
        	

_mm_mul_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mul_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        	

_mm_div_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_div_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] / b[31:0]
        dst[127:32] := a[127:32]
        	

_mm_div_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_div_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := a[i+31:i] / b[i+31:i]
        ENDFOR
        	

_mm_add_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_add_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := a[i+7:i] + b[i+7:i]
        ENDFOR
        	

_mm_add_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_add_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := a[i+15:i] + b[i+15:i]
        ENDFOR
        	

_mm_add_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_add_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        	

_mm_add_si64
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_add_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Add 64-bit integers "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] + b[63:0]
        	

_mm_add_epi64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_add_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        	

_mm_adds_epi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_adds_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_mm_adds_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_adds_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_mm_adds_epu8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_adds_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        	

_mm_adds_epu16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_adds_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        	

_mm_madd_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_madd_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        ENDFOR
        	

_mm_mulhi_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mulhi_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

_mm_mulhi_epu16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mulhi_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

_mm_mullo_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mullo_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[15:0]
        ENDFOR
        	

_mm_mul_su32
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m64 _mm_mul_su32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from "a" and "b", and store the unsigned 64-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[31:0] * b[31:0]
        	

_mm_mul_epu32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mul_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        	

_mm_sub_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_sub_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := a[i+7:i] - b[i+7:i]
        ENDFOR
        	

_mm_sub_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_sub_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := a[i+15:i] - b[i+15:i]
        ENDFOR
        	

_mm_sub_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sub_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        	

_mm_sub_si64
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m64 _mm_sub_si64(__m64 a, __m64 b);

.. admonition:: Intel Description

    Subtract 64-bit integer "b" from 64-bit integer "a", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] - b[63:0]
        	

_mm_sub_epi64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_sub_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        	

_mm_subs_epi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_subs_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_mm_subs_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_subs_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        ENDFOR
        	

_mm_subs_epu8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_subs_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        	

_mm_subs_epu16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_subs_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])	
        ENDFOR
        	

_mm_add_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_add_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] + b[63:0]
        dst[127:64] := a[127:64]
        	

_mm_add_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_add_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        	

_mm_div_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_div_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] / b[63:0]
        dst[127:64] := a[127:64]
        	

_mm_div_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_div_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	dst[i+63:i] := a[i+63:i] / b[i+63:i]
        ENDFOR
        	

_mm_mul_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mul_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] * b[63:0]
        dst[127:64] := a[127:64]
        	

_mm_mul_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mul_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] * b[i+63:i]
        ENDFOR
        	

_mm_sub_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_sub_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] - b[63:0]
        dst[127:64] := a[127:64]
        	

_mm_sub_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: emmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_sub_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        	

_mm_addsub_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_addsub_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	FI
        ENDFOR
        	

_mm_addsub_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_addsub_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	FI
        ENDFOR
        	

_mm_hadd_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_hadd_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[127:64] + a[63:0]
        dst[127:64] := b[127:64] + b[63:0]
        	

_mm_hadd_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_hadd_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] + a[31:0]
        dst[63:32] := a[127:96] + a[95:64]
        dst[95:64] := b[63:32] + b[31:0]
        dst[127:96] := b[127:96] + b[95:64]
        	

_mm_hsub_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_hsub_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] - a[127:64]
        dst[127:64] := b[63:0] - b[127:64]
        	

_mm_hsub_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: pmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_hsub_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - a[63:32]
        dst[63:32] := a[95:64] - a[127:96]
        dst[95:64] := b[31:0] - b[63:32]
        dst[127:96] := b[95:64] - b[127:96]
        	

_mm_dp_pd
^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: smmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_dp_pd(__m128d a, __m128d b, const int imm8);

.. admonition:: Intel Description

    Conditionally multiply the packed double-precision (64-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
        	FOR j := 0 to 1
        		i := j*64
        		IF imm8[(4+j)%8]
        			temp[i+63:i] := a[i+63:i] * b[i+63:i]
        		ELSE
        			temp[i+63:i] := 0.0
        		FI
        	ENDFOR
        	
        	sum[63:0] := temp[127:64] + temp[63:0]
        	
        	FOR j := 0 to 1
        		i := j*64
        		IF imm8[j%8]
        			tmpdst[i+63:i] := sum[63:0]
        		ELSE
        			tmpdst[i+63:i] := 0.0
        		FI
        	ENDFOR
        	RETURN tmpdst[127:0]
        }
        dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
        	

_mm_dp_ps
^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: smmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_dp_ps(__m128 a, __m128 b, const int imm8);

.. admonition:: Intel Description

    Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
        	FOR j := 0 to 3
        		i := j*32
        		IF imm8[(4+j)%8]
        			temp[i+31:i] := a[i+31:i] * b[i+31:i]
        		ELSE
        			temp[i+31:i] := 0
        		FI
        	ENDFOR
        	
        	sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0])
        	
        	FOR j := 0 to 3
        		i := j*32
        		IF imm8[j%8]
        			tmpdst[i+31:i] := sum[31:0]
        		ELSE
        			tmpdst[i+31:i] := 0
        		FI
        	ENDFOR
        	RETURN tmpdst[127:0]
        }
        dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
        	

_mm_mul_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: smmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mul_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        ENDFOR
        	

_mm_mullo_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: smmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mullo_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	tmp[63:0] := a[i+31:i] * b[i+31:i]
        	dst[i+31:i] := tmp[31:0]
        ENDFOR
        	

_mm_hadd_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_hadd_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[31:16] + a[15:0]
        dst[31:16] := a[63:48] + a[47:32]
        dst[47:32] := a[95:80] + a[79:64]
        dst[63:48] := a[127:112] + a[111:96]
        dst[79:64] := b[31:16] + b[15:0]
        dst[95:80] := b[63:48] + b[47:32]
        dst[111:96] := b[95:80] + b[79:64]
        dst[127:112] := b[127:112] + b[111:96]
        	

_mm_hadds_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_hadds_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:16] + a[15:0])
        dst[31:16] := Saturate16(a[63:48] + a[47:32])
        dst[47:32] := Saturate16(a[95:80] + a[79:64])
        dst[63:48] := Saturate16(a[127:112] + a[111:96])
        dst[79:64] := Saturate16(b[31:16] + b[15:0])
        dst[95:80] := Saturate16(b[63:48] + b[47:32])
        dst[111:96] := Saturate16(b[95:80] + b[79:64])
        dst[127:112] := Saturate16(b[127:112] + b[111:96])
        	

_mm_hadd_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_hadd_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] + a[31:0]
        dst[63:32] := a[127:96] + a[95:64]
        dst[95:64] := b[63:32] + b[31:0]
        dst[127:96] := b[127:96] + b[95:64]
        	

_mm_hadd_pi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_hadd_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[31:16] + a[15:0]
        dst[31:16] := a[63:48] + a[47:32]
        dst[47:32] := b[31:16] + b[15:0]
        dst[63:48] := b[63:48] + b[47:32]
        	

_mm_hadd_pi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _mm_hadd_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] + a[31:0]
        dst[63:32] := b[63:32] + b[31:0]
        	

_mm_hadds_pi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_hadds_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:16] + a[15:0])
        dst[31:16] := Saturate16(a[63:48] + a[47:32])
        dst[47:32] := Saturate16(b[31:16] + b[15:0])
        dst[63:48] := Saturate16(b[63:48] + b[47:32])
        	

_mm_hsub_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_hsub_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[15:0] - a[31:16]
        dst[31:16] := a[47:32] - a[63:48]
        dst[47:32] := a[79:64] - a[95:80]
        dst[63:48] := a[111:96] - a[127:112]
        dst[79:64] := b[15:0] - b[31:16]
        dst[95:80] := b[47:32] - b[63:48]
        dst[111:96] := b[79:64] - b[95:80]
        dst[127:112] := b[111:96] - b[127:112]
        	

_mm_hsubs_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_hsubs_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[15:0] - a[31:16])
        dst[31:16] := Saturate16(a[47:32] - a[63:48])
        dst[47:32] := Saturate16(a[79:64] - a[95:80])
        dst[63:48] := Saturate16(a[111:96] - a[127:112])
        dst[79:64] := Saturate16(b[15:0] - b[31:16])
        dst[95:80] := Saturate16(b[47:32] - b[63:48])
        dst[111:96] := Saturate16(b[79:64] - b[95:80])
        dst[127:112] := Saturate16(b[111:96] - b[127:112])
        	

_mm_hsub_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_hsub_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - a[63:32]
        dst[63:32] := a[95:64] - a[127:96]
        dst[95:64] := b[31:0] - b[63:32]
        dst[127:96] := b[95:64] - b[127:96]
        	

_mm_hsub_pi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_hsub_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[15:0] - a[31:16]
        dst[31:16] := a[47:32] - a[63:48]
        dst[47:32] := b[15:0] - b[31:16]
        dst[63:48] := b[47:32] - b[63:48]
        	

_mm_hsub_pi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _mm_hsub_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - a[63:32]
        dst[63:32] := b[31:0] - b[63:32]
        	

_mm_hsubs_pi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_hsubs_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[15:0] - a[31:16])
        dst[31:16] := Saturate16(a[47:32] - a[63:48])
        dst[47:32] := Saturate16(b[15:0] - b[31:16])
        dst[63:48] := Saturate16(b[47:32] - b[63:48])
        	

_mm_maddubs_epi16
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maddubs_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        ENDFOR
        	

_mm_maddubs_pi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    SI8 b

.. code-block:: C

    __m64 _mm_maddubs_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        ENDFOR
        	

_mm_mulhrs_epi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mulhrs_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        	dst[i+15:i] := tmp[16:1]
        ENDFOR
        	

_mm_mulhrs_pi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_mulhrs_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        	dst[i+15:i] := tmp[16:1]
        ENDFOR
        	

_mm_sign_epi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_sign_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF b[i+7:i] < 0
        		dst[i+7:i] := -(a[i+7:i])
        	ELSE IF b[i+7:i] == 0
        		dst[i+7:i] := 0
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_sign_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_sign_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF b[i+15:i] < 0
        		dst[i+15:i] := -(a[i+15:i])
        	ELSE IF b[i+15:i] == 0
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm_sign_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_sign_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF b[i+31:i] < 0
        		dst[i+31:i] := -(a[i+31:i])
        	ELSE IF b[i+31:i] == 0
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_sign_pi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m64 _mm_sign_pi8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	IF b[i+7:i] < 0
        		dst[i+7:i] := -(a[i+7:i])
        	ELSE IF b[i+7:i] == 0
        		dst[i+7:i] := 0
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_sign_pi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m64 _mm_sign_pi16(__m64 a, __m64 b);

.. admonition:: Intel Description

    Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	IF b[i+15:i] < 0
        		dst[i+15:i] := -(a[i+15:i])
        	ELSE IF b[i+15:i] == 0
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm_sign_pi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: tmmintrin.h
:Searchable: SSE_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m64 _mm_sign_pi32(__m64 a, __m64 b);

.. admonition:: Intel Description

    Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	IF b[i+31:i] < 0
        		dst[i+31:i] := -(a[i+31:i])
        	ELSE IF b[i+31:i] == 0
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        	

MMX
~~~
_m_pmulhuw
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Arithmetic
:Header: xmmintrin.h
:Searchable: SSE_ALL-Arithmetic-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m64 _m_pmulhuw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        	

Compare
-------
XMM
~~~
_mm_cmpeq_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpeq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] == b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpeq_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpeq_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmplt_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmplt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] < b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmplt_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmplt_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmple_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmple_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] <= b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmple_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmple_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] <= b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpgt_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpgt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] > b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpgt_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpgt_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpge_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpge_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] >= b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpge_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpge_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] >= b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpneq_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpneq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ( a[31:0] != b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpneq_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpneq_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] != b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnlt_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnlt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (!( a[31:0] < b[31:0] )) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpnlt_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnlt_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := !( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnle_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnle_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (!( a[31:0] <= b[31:0] )) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpnle_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnle_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (!( a[i+31:i] <= b[i+31:i] )) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpngt_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpngt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (!( a[31:0] > b[31:0] )) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpngt_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpngt_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (!( a[i+31:i] > b[i+31:i] )) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnge_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnge_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (!( a[31:0] >= b[31:0] )) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpnge_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpnge_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (!( a[i+31:i] >= b[i+31:i] )) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpord_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpord_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := ( a[31:0] != NaN AND b[31:0] != NaN ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpord_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpord_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] != NaN AND b[i+31:i] != NaN ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpunord_ss
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpunord_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := ( a[31:0] == NaN OR b[31:0] == NaN ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        	

_mm_cmpunord_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_cmpunord_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == NaN OR b[i+31:i] == NaN ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_comieq_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comieq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
        	

_mm_comilt_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comilt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] < b[31:0] ) ? 1 : 0
        	

_mm_comile_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comile_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] <= b[31:0] ) ? 1 : 0
        	

_mm_comigt_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comigt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] > b[31:0] ) ? 1 : 0
        	

_mm_comige_ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comige_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] >= b[31:0] ) ? 1 : 0
        	

_mm_comineq_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_comineq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] == NaN OR b[31:0] == NaN OR a[31:0] != b[31:0] ) ? 1 : 0
        	

_mm_ucomieq_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomieq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
        	

_mm_ucomilt_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomilt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] < b[31:0] ) ? 1 : 0
        	

_mm_ucomile_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomile_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] <= b[31:0] ) ? 1 : 0
        	

_mm_ucomigt_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomigt_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] > b[31:0] ) ? 1 : 0
        	

_mm_ucomige_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomige_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] >= b[31:0] ) ? 1 : 0
        	

_mm_ucomineq_ss
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: xmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_ucomineq_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[31:0] == NaN OR b[31:0] == NaN OR a[31:0] != b[31:0] ) ? 1 : 0
        	

_mm_cmpeq_epi8
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_cmpeq_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_mm_cmpeq_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_cmpeq_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_mm_cmpeq_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_cmpeq_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpgt_epi8
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_cmpgt_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_mm_cmpgt_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_cmpgt_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_mm_cmpgt_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_cmpgt_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmplt_epi8
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_cmplt_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] < b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        	

_mm_cmplt_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_cmplt_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] < b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        	

_mm_cmplt_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_cmplt_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpeq_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpeq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] == b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmplt_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmplt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] < b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmple_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmple_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] <= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpgt_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpgt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] > b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpge_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpge_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] >= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpord_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpord_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := (a[63:0] != NaN AND b[63:0] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpunord_sd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpunord_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := (a[63:0] == NaN OR b[63:0] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpneq_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpneq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] != b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpnlt_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnlt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (!(a[63:0] < b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpnle_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnle_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (!(a[63:0] <= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpngt_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpngt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (!(a[63:0] > b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpnge_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnge_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (!(a[63:0] >= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        	

_mm_cmpeq_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpeq_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] == b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmplt_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmplt_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] < b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmple_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmple_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] <= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpgt_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpgt_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] > b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpge_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpge_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] >= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpord_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpord_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpunord_pd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpunord_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpneq_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpneq_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] != b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnlt_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnlt_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (!(a[i+63:i] < b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnle_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnle_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (!(a[i+63:i] <= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpngt_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpngt_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (!(a[i+63:i] > b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpnge_pd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_cmpnge_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (!(a[i+63:i] >= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_comieq_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comieq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] == b[63:0] ) ? 1 : 0
        	

_mm_comilt_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comilt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] < b[63:0] ) ? 1 : 0
        	

_mm_comile_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comile_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] <= b[63:0] ) ? 1 : 0
        	

_mm_comigt_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comigt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] > b[63:0] ) ? 1 : 0
        	

_mm_comige_sd
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comige_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] >= b[63:0] ) ? 1 : 0
        	

_mm_comineq_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_comineq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] == NaN OR b[63:0] == NaN OR a[63:0] != b[63:0] ) ? 1 : 0
        	

_mm_ucomieq_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomieq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] == b[63:0] ) ? 1 : 0
        	

_mm_ucomilt_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomilt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] < b[63:0] ) ? 1 : 0
        	

_mm_ucomile_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomile_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] <= b[63:0] ) ? 1 : 0
        	

_mm_ucomigt_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomigt_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] > b[63:0] ) ? 1 : 0
        	

_mm_ucomige_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomige_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] >= b[63:0] ) ? 1 : 0
        	

_mm_ucomineq_sd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: emmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_ucomineq_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a[63:0] == NaN OR b[63:0] == NaN OR a[63:0] != b[63:0] ) ? 1 : 0
        	

_mm_cmpeq_epi64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: smmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_cmpeq_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

_mm_cmpgt_epi64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Compare
:Header: nmmintrin.h
:Searchable: SSE_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_cmpgt_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        	

Set
---
XMM
~~~
_mm_set_ss
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_set_ss(float a);

.. admonition:: Intel Description

    Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[127:32] := 0
        	

_mm_set1_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_set1_ps(float a);

.. admonition:: Intel Description

    Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        	

_mm_set_ps1
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_set_ps1(float a);

.. admonition:: Intel Description

    Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        	

_mm_set_ps
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m128 _mm_set_ps(float e3, float e2, float e1, float e0);

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        	

_mm_setr_ps
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m128 _mm_setr_ps(float e3, float e2, float e1, float e0);

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e3
        dst[63:32] := e2
        dst[95:64] := e1
        dst[127:96] := e0
        	

_mm_setzero_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: xmmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128

.. code-block:: C

    __m128 _mm_setzero_ps(void );

.. admonition:: Intel Description

    Return vector of type __m128 with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm_set_epi64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m64 e1, 
    __m64 e0
:Param ETypes:
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m128i _mm_set_epi64(__m64 e1, __m64 e0);

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        	

_mm_set_epi64x
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 e1, 
    __int64 e0
:Param ETypes:
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m128i _mm_set_epi64x(__int64 e1, __int64 e0);

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        	

_mm_set_epi32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m128i _mm_set_epi32(int e3, int e2, int e1, int e0);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        	

_mm_set_epi16
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    short e7, 
    short e6, 
    short e5, 
    short e4, 
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e7, 
    UI16 e6, 
    UI16 e5, 
    UI16 e4, 
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m128i _mm_set_epi16(short e7, short e6, short e5,
                          short e4, short e3, short e2,
                          short e1, short e0)

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e0
        dst[31:16] := e1
        dst[47:32] := e2
        dst[63:48] := e3
        dst[79:64] := e4
        dst[95:80] := e5
        dst[111:96] := e6
        dst[127:112] := e7
        	

_mm_set_epi8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    char e15, 
    char e14, 
    char e13, 
    char e12, 
    char e11, 
    char e10, 
    char e9, 
    char e8, 
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e15, 
    UI8 e14, 
    UI8 e13, 
    UI8 e12, 
    UI8 e11, 
    UI8 e10, 
    UI8 e9, 
    UI8 e8, 
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m128i _mm_set_epi8(char e15, char e14, char e13, char e12,
                         char e11, char e10, char e9, char e8,
                         char e7, char e6, char e5, char e4,
                         char e3, char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e0
        dst[15:8] := e1
        dst[23:16] := e2
        dst[31:24] := e3
        dst[39:32] := e4
        dst[47:40] := e5
        dst[55:48] := e6
        dst[63:56] := e7
        dst[71:64] := e8
        dst[79:72] := e9
        dst[87:80] := e10
        dst[95:88] := e11
        dst[103:96] := e12
        dst[111:104] := e13
        dst[119:112] := e14
        dst[127:120] := e15
        	

_mm_set1_epi64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_set1_epi64(__m64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        	

_mm_set1_epi64x
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_set1_epi64x(__int64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        	

_mm_set1_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_set1_epi32(int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastd".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        	

_mm_set1_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    short a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_set1_epi16(short a);

.. admonition:: Intel Description

    Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate "vpbroadcastw".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        	

_mm_set1_epi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    char a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_set1_epi8(char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastb".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        	

_mm_setr_epi64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m64 e1, 
    __m64 e0
:Param ETypes:
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m128i _mm_setr_epi64(__m64 e1, __m64 e0);

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e1
        dst[127:64] := e0
        	

_mm_setr_epi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m128i _mm_setr_epi32(int e3, int e2, int e1, int e0);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e3
        dst[63:32] := e2
        dst[95:64] := e1
        dst[127:96] := e0
        	

_mm_setr_epi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    short e7, 
    short e6, 
    short e5, 
    short e4, 
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e7, 
    UI16 e6, 
    UI16 e5, 
    UI16 e4, 
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m128i _mm_setr_epi16(short e7, short e6, short e5,
                           short e4, short e3, short e2,
                           short e1, short e0)

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e7
        dst[31:16] := e6
        dst[47:32] := e5
        dst[63:48] := e4
        dst[79:64] := e3
        dst[95:80] := e2
        dst[111:96] := e1
        dst[127:112] := e0
        	

_mm_setr_epi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    char e15, 
    char e14, 
    char e13, 
    char e12, 
    char e11, 
    char e10, 
    char e9, 
    char e8, 
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e15, 
    UI8 e14, 
    UI8 e13, 
    UI8 e12, 
    UI8 e11, 
    UI8 e10, 
    UI8 e9, 
    UI8 e8, 
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m128i _mm_setr_epi8(char e15, char e14, char e13,
                          char e12, char e11, char e10, char e9,
                          char e8, char e7, char e6, char e5,
                          char e4, char e3, char e2, char e1,
                          char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e15
        dst[15:8] := e14
        dst[23:16] := e13
        dst[31:24] := e12
        dst[39:32] := e11
        dst[47:40] := e10
        dst[55:48] := e9
        dst[63:56] := e8
        dst[71:64] := e7
        dst[79:72] := e6
        dst[87:80] := e5
        dst[95:88] := e4
        dst[103:96] := e3
        dst[111:104] := e2
        dst[119:112] := e1
        dst[127:120] := e0
        	

_mm_setzero_si128
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m128i with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm_set_sd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_set_sd(double a);

.. admonition:: Intel Description

    Copy double-precision (64-bit) floating-point element "a" to the lower element of "dst", and zero the upper element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := 0
        	

_mm_set1_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_set1_pd(double a);

.. admonition:: Intel Description

    Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        	

_mm_set_pd1
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_set_pd1(double a);

.. admonition:: Intel Description

    Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        	

_mm_set_pd
^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double e1, 
    double e0
:Param ETypes:
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m128d _mm_set_pd(double e1, double e0);

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        	

_mm_setr_pd
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double e1, 
    double e0
:Param ETypes:
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m128d _mm_setr_pd(double e1, double e0);

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e1
        dst[127:64] := e0
        	

_mm_setzero_pd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Set
:Header: emmintrin.h
:Searchable: SSE_ALL-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128d

.. code-block:: C

    __m128d _mm_setzero_pd(void );

.. admonition:: Intel Description

    Return vector of type __m128d with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

Convert
-------
XMM
~~~
_mm_cvtsi32_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int b
:Param ETypes:
    FP32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvtsi32_ss(__m128 a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_cvt_si2ss
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int b
:Param ETypes:
    FP32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvt_si2ss(__m128 a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        	

_mm_cvtsi64_ss
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __int64 b
:Param ETypes:
    FP32 a, 
    SI64 b

.. code-block:: C

    __m128 _mm_cvtsi64_ss(__m128 a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtpi32_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m64 b
:Param ETypes:
    FP32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvtpi32_ps(__m128 a, __m64 b);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[63:32] := Convert_Int32_To_FP32(b[63:32])
        dst[95:64] := a[95:64]
        dst[127:96] := a[127:96]
        	

_mm_cvt_pi2ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m64 b
:Param ETypes:
    FP32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvt_pi2ps(__m128 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[63:32] := Convert_Int32_To_FP32(b[63:32])
        dst[95:64] := a[95:64]
        dst[127:96] := a[127:96]
        	

_mm_cvtpi16_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m64 a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128 _mm_cvtpi16_ps(__m64 a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	m := j*32
        	dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i])
        ENDFOR
        	

_mm_cvtpu16_ps
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m64 a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128 _mm_cvtpu16_ps(__m64 a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*16
        	m := j*32
        	dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i])
        ENDFOR
        	

_mm_cvtpi8_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m64 a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m128 _mm_cvtpi8_ps(__m64 a);

.. admonition:: Intel Description

    Convert the lower packed 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*8
        	m := j*32
        	dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i])
        ENDFOR
        	

_mm_cvtpu8_ps
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m64 a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128 _mm_cvtpu8_ps(__m64 a);

.. admonition:: Intel Description

    Convert the lower packed unsigned 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*8
        	m := j*32
        	dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i])
        ENDFOR
        	

_mm_cvtpi32x2_ps
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvtpi32x2_ps(__m64 a, __m64 b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", then covert the packed signed 32-bit integers in "b" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(a[31:0])
        dst[63:32] := Convert_Int32_To_FP32(a[63:32])
        dst[95:64] := Convert_Int32_To_FP32(b[31:0])
        dst[127:96] := Convert_Int32_To_FP32(b[63:32])
        	

_mm_cvtss_si32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvtss_si32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32(a[31:0])
        	

_mm_cvt_ss2si
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvt_ss2si(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32(a[31:0])
        	

_mm_cvtss_si64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _mm_cvtss_si64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64(a[31:0])
        	

_mm_cvtss_f32
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: float
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm_cvtss_f32(__m128 a);

.. admonition:: Intel Description

    Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm_cvtps_pi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvtps_pi32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        	

_mm_cvt_ps2pi
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvt_ps2pi(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        	

_mm_cvttss_si32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvttss_si32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
        	

_mm_cvtt_ss2si
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvtt_ss2si(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
        	

_mm_cvttss_si64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _mm_cvttss_si64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
        	

_mm_cvttps_pi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvttps_pi32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        	

_mm_cvtt_ps2pi
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvtt_ps2pi(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        	

_mm_cvtps_pi16
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvtps_pi16(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	k := 32*j
        	IF a[k+31:k] >= FP32(0x7FFF) && a[k+31:k] <= FP32(0x7FFFFFFF)
        		dst[i+15:i] := 0x7FFF
        	ELSE
        		dst[i+15:i] := Convert_FP32_To_Int16(a[k+31:k])
        	FI
        ENDFOR
        	

_mm_cvtps_pi8
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: xmmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m64 _mm_cvtps_pi8(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 8-bit integers, and store the results in lower 4 elements of "dst". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 8*j
        	k := 32*j
        	IF a[k+31:k] >= FP32(0x7F) && a[k+31:k] <= FP32(0x7FFFFFFF)
        		dst[i+7:i] := 0x7F
        	ELSE
        		dst[i+7:i] := Convert_FP32_To_Int8(a[k+31:k])
        	FI
        ENDFOR
        	

_mm_cvtepi32_pd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128d _mm_cvtepi32_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        ENDFOR
        	

_mm_cvtsi32_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int b
:Param ETypes:
    FP64 a, 
    SI32 b

.. code-block:: C

    __m128d _mm_cvtsi32_sd(__m128d a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int32_To_FP64(b[31:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtsi64_sd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __int64 b
:Param ETypes:
    FP64 a, 
    SI64 b

.. code-block:: C

    __m128d _mm_cvtsi64_sd(__m128d a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtsi64x_sd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __int64 b
:Param ETypes:
    FP64 a, 
    SI64 b

.. code-block:: C

    __m128d _mm_cvtsi64x_sd(__m128d a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtepi32_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128 _mm_cvtepi32_ps(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        	

_mm_cvtpi32_pd
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m64 a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128d _mm_cvtpi32_pd(__m64 a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        ENDFOR
        	

_mm_cvtsi32_si128
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtsi32_si128(int a);

.. admonition:: Intel Description

    Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        dst[127:32] := 0
        	

_mm_cvtsi64_si128
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtsi64_si128(__int64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := 0
        	

_mm_cvtsi64x_si128
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtsi64x_si128(__int64 a);

.. admonition:: Intel Description

    Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := 0
        	

_mm_cvtsi128_si32
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm_cvtsi128_si32(__m128i a);

.. admonition:: Intel Description

    Copy the lower 32-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm_cvtsi128_si64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm_cvtsi128_si64(__m128i a);

.. admonition:: Intel Description

    Copy the lower 64-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm_cvtsi128_si64x
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm_cvtsi128_si64x(__m128i a);

.. admonition:: Intel Description

    Copy the lower 64-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm_cvtpd_ps
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128 _mm_cvtpd_ps(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
        ENDFOR
        dst[127:64] := 0
        	

_mm_cvtps_pd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128d _mm_cvtps_pd(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
        ENDFOR
        	

_mm_cvtpd_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvtpd_epi32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
        ENDFOR
        	

_mm_cvtsd_si32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm_cvtsd_si32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32(a[63:0])
        	

_mm_cvtsd_si64
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvtsd_si64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64(a[63:0])
        	

_mm_cvtsd_si64x
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvtsd_si64x(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64(a[63:0])
        	

_mm_cvtsd_ss
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128d b
:Param ETypes:
    FP32 a, 
    FP64 b

.. code-block:: C

    __m128 _mm_cvtsd_ss(__m128 a, __m128d b);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtsd_f64
^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: double
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm_cvtsd_f64(__m128d a);

.. admonition:: Intel Description

    Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm_cvtss_sd
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128 b
:Param ETypes:
    FP64 a, 
    FP32 b

.. code-block:: C

    __m128d _mm_cvtss_sd(__m128d a, __m128 b);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvttpd_epi32
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvttpd_epi32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
        ENDFOR
        	

_mm_cvttsd_si32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm_cvttsd_si32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
        	

_mm_cvttsd_si64
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvttsd_si64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
        	

_mm_cvttsd_si64x
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvttsd_si64x(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
        	

_mm_cvtps_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvtps_epi32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        	

_mm_cvttps_epi32
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvttps_epi32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        	

_mm_cvtpd_pi32
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m64 _mm_cvtpd_pi32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
        ENDFOR
        	

_mm_cvttpd_pi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: emmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m64 _mm_cvttpd_pi32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
        ENDFOR
        	

_mm_cvtepi8_epi16
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m128i _mm_cvtepi8_epi16(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	dst[l+15:l] := SignExtend16(a[i+7:i])
        ENDFOR
        	

_mm_cvtepi8_epi32
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m128i _mm_cvtepi8_epi32(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := SignExtend32(a[k+7:k])
        ENDFOR
        	

_mm_cvtepi8_epi64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m128i _mm_cvtepi8_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := SignExtend64(a[k+7:k])
        ENDFOR
        	

_mm_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128i _mm_cvtepi16_epi32(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := SignExtend32(a[k+15:k])
        ENDFOR
        	

_mm_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128i _mm_cvtepi16_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := SignExtend64(a[k+15:k])
        ENDFOR
        	

_mm_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm_cvtepi32_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := SignExtend64(a[k+31:k])
        ENDFOR
        	

_mm_cvtepu8_epi16
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_cvtepu8_epi16(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	dst[l+15:l] := ZeroExtend16(a[i+7:i])
        ENDFOR
        	

_mm_cvtepu8_epi32
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_cvtepu8_epi32(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := ZeroExtend32(a[k+7:k])
        ENDFOR
        	

_mm_cvtepu8_epi64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_cvtepu8_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := ZeroExtend64(a[k+7:k])
        ENDFOR
        	

_mm_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_cvtepu16_epi32(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := ZeroExtend32(a[k+15:k])
        ENDFOR
        	

_mm_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_cvtepu16_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := ZeroExtend64(a[k+15:k])
        ENDFOR
        	

_mm_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Convert
:Header: smmintrin.h
:Searchable: SSE_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtepu32_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := ZeroExtend64(a[k+31:k])
        ENDFOR
        	

Miscellaneous
-------------
XMM
~~~
_mm_sad_pu8
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: xmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _mm_sad_pu8(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
        ENDFOR
        dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56]
        dst[63:16] := 0
        	

_mm_movemask_pi8
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: xmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m64 a
:Param ETypes:
    UI8 a

.. code-block:: C

    int _mm_movemask_pi8(__m64 a);

.. admonition:: Intel Description

    Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[j] := a[i+7]
        ENDFOR
        dst[MAX:8] := 0
        	

_mm_movemask_ps
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: xmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_movemask_ps(__m128 a);

.. admonition:: Intel Description

    Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF a[i+31]
        		dst[j] := 1
        	ELSE
        		dst[j] := 0
        	FI
        ENDFOR
        dst[MAX:4] := 0
        	

_mm_sad_epu8
^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_sad_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
        ENDFOR
        FOR j := 0 to 1
        	i := j*64
        	dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
        	               tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
        	dst[i+63:i+16] := 0
        ENDFOR
        	

_mm_movepi64_pi64
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m64 _mm_movepi64_pi64(__m128i a);

.. admonition:: Intel Description

    Copy the lower 64-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm_packs_epi16
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_packs_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := Saturate8(a[15:0])
        dst[15:8] := Saturate8(a[31:16])
        dst[23:16] := Saturate8(a[47:32])
        dst[31:24] := Saturate8(a[63:48])
        dst[39:32] := Saturate8(a[79:64])
        dst[47:40] := Saturate8(a[95:80])
        dst[55:48] := Saturate8(a[111:96])
        dst[63:56] := Saturate8(a[127:112])
        dst[71:64] := Saturate8(b[15:0])
        dst[79:72] := Saturate8(b[31:16])
        dst[87:80] := Saturate8(b[47:32])
        dst[95:88] := Saturate8(b[63:48])
        dst[103:96] := Saturate8(b[79:64])
        dst[111:104] := Saturate8(b[95:80])
        dst[119:112] := Saturate8(b[111:96])
        dst[127:120] := Saturate8(b[127:112])
        	

_mm_packs_epi32
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_packs_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:0])
        dst[31:16] := Saturate16(a[63:32])
        dst[47:32] := Saturate16(a[95:64])
        dst[63:48] := Saturate16(a[127:96])
        dst[79:64] := Saturate16(b[31:0])
        dst[95:80] := Saturate16(b[63:32])
        dst[111:96] := Saturate16(b[95:64])
        dst[127:112] := Saturate16(b[127:96])
        	

_mm_packus_epi16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_packus_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := SaturateU8(a[15:0])
        dst[15:8] := SaturateU8(a[31:16])
        dst[23:16] := SaturateU8(a[47:32])
        dst[31:24] := SaturateU8(a[63:48])
        dst[39:32] := SaturateU8(a[79:64])
        dst[47:40] := SaturateU8(a[95:80])
        dst[55:48] := SaturateU8(a[111:96])
        dst[63:56] := SaturateU8(a[127:112])
        dst[71:64] := SaturateU8(b[15:0])
        dst[79:72] := SaturateU8(b[31:16])
        dst[87:80] := SaturateU8(b[47:32])
        dst[95:88] := SaturateU8(b[63:48])
        dst[103:96] := SaturateU8(b[79:64])
        dst[111:104] := SaturateU8(b[95:80])
        dst[119:112] := SaturateU8(b[111:96])
        dst[127:120] := SaturateU8(b[127:112])
        	

_mm_movemask_epi8
^^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    int _mm_movemask_epi8(__m128i a);

.. admonition:: Intel Description

    Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[j] := a[i+7]
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_movemask_pd
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: emmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm_movemask_pd(__m128d a);

.. admonition:: Intel Description

    Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF a[i+63]
        		dst[j] := 1
        	ELSE
        		dst[j] := 0
        	FI
        ENDFOR
        dst[MAX:2] := 0
        	

_mm_mpsadbw_epu8
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: smmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mpsadbw_epu8(__m128i a, __m128i b,
                             const int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
    	Eight SADs are performed using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) {
        	a_offset := imm8[2]*32
        	b_offset := imm8[1:0]*32
        	FOR j := 0 to 7
        		i := j*8
        		k := a_offset+i
        		l := b_offset
        		tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \
        		                   ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24]))
        	ENDFOR
        	RETURN tmp[127:0]
        }
        dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0])
        	

_mm_packus_epi32
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: smmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_packus_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := SaturateU16(a[31:0])
        dst[31:16] := SaturateU16(a[63:32])
        dst[47:32] := SaturateU16(a[95:64])
        dst[63:48] := SaturateU16(a[127:96])
        dst[79:64] := SaturateU16(b[31:0])
        dst[95:80] := SaturateU16(b[63:32])
        dst[111:96] := SaturateU16(b[95:64])
        dst[127:112] := SaturateU16(b[127:96])
        	

_mm_minpos_epu16
^^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: smmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_minpos_epu16(__m128i a);

.. admonition:: Intel Description

    Horizontally compute the minimum amongst the packed unsigned 16-bit integers in "a", store the minimum and index in "dst", and zero the remaining bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        index[2:0] := 0
        min[15:0] := a[15:0]
        FOR j := 0 to 7
        	i := j*16
        	IF a[i+15:i] < min[15:0]
        		index[2:0] := j
        		min[15:0] := a[i+15:i]
        	FI
        ENDFOR
        dst[15:0] := min[15:0]
        dst[18:16] := index[2:0]
        dst[127:19] := 0
        	

_mm_alignr_epi8
^^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: tmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_alignr_epi8(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
        dst[127:0] := tmp[127:0]
        	

_mm_alignr_pi8
^^^^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: tmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b, 
    int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m64 _mm_alignr_pi8(__m64 a, __m64 b, int imm8);

.. admonition:: Intel Description

    Concatenate 8-byte blocks in "a" and "b" into a 16-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := ((a[63:0] << 64)[127:0] OR b[63:0]) >> (imm8*8)
        dst[63:0] := tmp[63:0]
        	

MMX
~~~
_m_psadbw
^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: xmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-MMX
:Register: MMX 64 bit
:Return Type: __m64
:Param Types:
    __m64 a, 
    __m64 b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m64 _m_psadbw(__m64 a, __m64 b);

.. admonition:: Intel Description

    Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
        ENDFOR
        dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56]
        dst[63:16] := 0
        	

_m_pmovmskb
^^^^^^^^^^^
:Tech: SSE_ALL
:Category: Miscellaneous
:Header: xmmintrin.h
:Searchable: SSE_ALL-Miscellaneous-MMX
:Register: MMX 64 bit
:Return Type: int
:Param Types:
    __m64 a
:Param ETypes:
    UI8 a

.. code-block:: C

    int _m_pmovmskb(__m64 a);

.. admonition:: Intel Description

    Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[j] := a[i+7]
        ENDFOR
        dst[MAX:8] := 0
        	

Other
=====
Cryptography
------------
ZMM
~~~
_mm512_aesenclast_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m512i _mm512_aesenclast_epi128(__m512i a,
                                     __m512i RoundKey)

.. admonition:: Intel Description

    Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*128
        	a[i+127:i] := ShiftRows(a[i+127:i])
        	a[i+127:i] := SubBytes(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_aesenc_epi128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m512i _mm512_aesenc_epi128(__m512i a, __m512i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*128
        	a[i+127:i] := ShiftRows(a[i+127:i])
        	a[i+127:i] := SubBytes(a[i+127:i])
        	a[i+127:i] := MixColumns(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_aesdeclast_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m512i _mm512_aesdeclast_epi128(__m512i a,
                                     __m512i RoundKey)

.. admonition:: Intel Description

    Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*128
        	a[i+127:i] := InvShiftRows(a[i+127:i])
        	a[i+127:i] := InvSubBytes(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_aesdec_epi128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m512i _mm512_aesdec_epi128(__m512i a, __m512i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*128
        	a[i+127:i] := InvShiftRows(a[i+127:i])
        	a[i+127:i] := InvSubBytes(a[i+127:i])
        	a[i+127:i] := InvMixColumns(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_aesenclast_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m256i _mm256_aesenclast_epi128(__m256i a,
                                     __m256i RoundKey)

.. admonition:: Intel Description

    Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*128
        	a[i+127:i] := ShiftRows(a[i+127:i])
        	a[i+127:i] := SubBytes(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_aesenc_epi128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m256i _mm256_aesenc_epi128(__m256i a, __m256i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*128
        	a[i+127:i] := ShiftRows(a[i+127:i])
        	a[i+127:i] := SubBytes(a[i+127:i])
        	a[i+127:i] := MixColumns(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_aesdeclast_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m256i _mm256_aesdeclast_epi128(__m256i a,
                                     __m256i RoundKey)

.. admonition:: Intel Description

    Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*128
        	a[i+127:i] := InvShiftRows(a[i+127:i])
        	a[i+127:i] := InvSubBytes(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_aesdec_epi128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m256i _mm256_aesdec_epi128(__m256i a, __m256i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*128
        	a[i+127:i] := InvShiftRows(a[i+127:i])
        	a[i+127:i] := InvSubBytes(a[i+127:i])
        	a[i+127:i] := InvMixColumns(a[i+127:i])
        	dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_aesenc_si128
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m128i _mm_aesenc_si128(__m128i a, __m128i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        a[127:0] := ShiftRows(a[127:0])
        a[127:0] := SubBytes(a[127:0])
        a[127:0] := MixColumns(a[127:0])
        dst[127:0] := a[127:0] XOR RoundKey[127:0]
        	

_mm_aesenclast_si128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m128i _mm_aesenclast_si128(__m128i a, __m128i RoundKey);

.. admonition:: Intel Description

    Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        a[127:0] := ShiftRows(a[127:0])
        a[127:0] := SubBytes(a[127:0])
        dst[127:0] := a[127:0] XOR RoundKey[127:0]
        	

_mm_aesdec_si128
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m128i _mm_aesdec_si128(__m128i a, __m128i RoundKey);

.. admonition:: Intel Description

    Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        a[127:0] := InvShiftRows(a[127:0])
        a[127:0] := InvSubBytes(a[127:0])
        a[127:0] := InvMixColumns(a[127:0])
        dst[127:0] := a[127:0] XOR RoundKey[127:0]
        	

_mm_aesdeclast_si128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i RoundKey
:Param ETypes:
    M128 a, 
    M128 RoundKey

.. code-block:: C

    __m128i _mm_aesdeclast_si128(__m128i a, __m128i RoundKey);

.. admonition:: Intel Description

    Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        a[127:0] := InvShiftRows(a[127:0])
        a[127:0] := InvSubBytes(a[127:0])
        dst[127:0] := a[127:0] XOR RoundKey[127:0]
        	

_mm_aesimc_si128
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    M128 a

.. code-block:: C

    __m128i _mm_aesimc_si128(__m128i a);

.. admonition:: Intel Description

    Perform the InvMixColumns transformation on "a" and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[127:0] := InvMixColumns(a[127:0])
        	

_mm_aeskeygenassist_si128
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: wmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_aeskeygenassist_si128(__m128i a,
                                      const int imm8)

.. admonition:: Intel Description

    Assist in expanding the AES cipher key by computing steps towards generating a round key for encryption cipher using data from "a" and an 8-bit round constant specified in "imm8", and store the result in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        X3[31:0] := a[127:96]
        X2[31:0] := a[95:64]
        X1[31:0] := a[63:32]
        X0[31:0] := a[31:0]
        RCON[31:0] := ZeroExtend32(imm8[7:0])
        dst[31:0] := SubWord(X1)
        dst[63:32] := RotWord(SubWord(X1)) XOR RCON
        dst[95:64] := SubWord(X3)
        dst[127:96] := RotWord(SubWord(X3)) XOR RCON
        	

_mm_crc32_u8
^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned char v
:Param ETypes:
    UI32 crc, 
    UI8 v

.. code-block:: C

    unsigned int _mm_crc32_u8(unsigned int crc, unsigned char v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[7:0] := v[0:7] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[39:0] := tmp1[7:0] << 32 
        tmp4[39:0] := tmp2[31:0] << 8
        tmp5[39:0] := tmp3[39:0] XOR tmp4[39:0]
        tmp6[31:0] := MOD2(tmp5[39:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u16
^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned short v
:Param ETypes:
    UI32 crc, 
    UI16 v

.. code-block:: C

    unsigned int _mm_crc32_u16(unsigned int crc, unsigned short v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[15:0] := v[0:15] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[47:0] := tmp1[15:0] << 32
        tmp4[47:0] := tmp2[31:0] << 16
        tmp5[47:0] := tmp3[47:0] XOR tmp4[47:0]
        tmp6[31:0] := MOD2(tmp5[47:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u32
^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int crc, 
    unsigned int v
:Param ETypes:
    UI32 crc, 
    UI32 v

.. code-block:: C

    unsigned int _mm_crc32_u32(unsigned int crc, unsigned int v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[31:0] := v[0:31] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[63:0] := tmp1[31:0] << 32
        tmp4[63:0] := tmp2[31:0] << 32
        tmp5[63:0] := tmp3[63:0] XOR tmp4[63:0]
        tmp6[31:0] := MOD2(tmp5[63:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_crc32_u64
^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: nmmintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 crc, 
    unsigned __int64 v
:Param ETypes:
    UI64 crc, 
    UI64 v

.. code-block:: C

    unsigned __int64 _mm_crc32_u64(unsigned __int64 crc, unsigned __int64 v);

.. admonition:: Intel Description

    Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        tmp1[63:0] := v[0:63] // bit reflection
        tmp2[31:0] := crc[0:31] // bit reflection
        tmp3[95:0] := tmp1[31:0] << 32
        tmp4[95:0] := tmp2[63:0] << 64
        tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0]
        tmp6[31:0] := MOD2(tmp5[95:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
        dst[31:0] := tmp6[0:31] // bit reflection
        	

_mm_aesdec128kl_u8
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    __m128i __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesdec128kl_u8(__m128i* __odata, __m128i __idata, const void* __h);

.. admonition:: Intel Description

    Decrypt 10 rounds of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[__odata+127:__odata] := AES128Decrypt (__idata[127:0], __h[383:0])
        dst := ZF
        		

_mm_aesdec256kl_u8
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    __m128i __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesdec256kl_u8(__m128i* __odata, __m128i __idata, const void* __h);

.. admonition:: Intel Description

    Decrypt 10 rounds of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[__odata+127:__odata] := AES256Decrypt (__idata[127:0], __h[511:0])
        dst := ZF
        		

_mm_aesenc128kl_u8
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    __m128i __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesenc128kl_u8(__m128i* __odata, __m128i __idata, const void* __h);

.. admonition:: Intel Description

    Encrypt 10 rounds of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[__odata+127:__odata] := AES128Encrypt (__idata[127:0], __h[383:0])
        dst := ZF
        		

_mm_aesenc256kl_u8
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    __m128i __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesenc256kl_u8(__m128i* __odata, __m128i __idata, const void* __h);

.. admonition:: Intel Description

    Encrypt 10 rounds of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[__odata+127:__odata] := AES256Encrypt (__idata[127:0], __h[511:0])
        dst := ZF
        		

_mm_encodekey128_u32
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int __htype, 
    __m128i __key, 
    void* __h
:Param ETypes:
    UI32 __htype, 
    UI8 __key, 
    UI8 __h

.. code-block:: C

    unsigned int _mm_encodekey128_u32(unsigned int __htype, __m128i __key, void* __h);

.. admonition:: Intel Description

    Wrap a 128-bit AES key from "__key" into a 384-bit key __h stored in "__h" and set IWKey's NoBackup and KeySource bits in "dst". The explicit source operand "__htype" specifies __h restrictions.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        __h[383:0] := WrapKey128(__key[127:0], __htype)
        dst[0] := IWKey.NoBackup
        dst[4:1] := IWKey.KeySource[3:0]
        		

_mm_encodekey256_u32
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    unsigned int __htype, 
    __m128i __key_lo, 
    __m128i __key_hi, 
    void* __h
:Param ETypes:
    UI32 __htype, 
    UI8 __key_lo, 
    UI8 __key_hi, 
    UI8 __h

.. code-block:: C

    unsigned int _mm_encodekey256_u32(unsigned int __htype, __m128i __key_lo, __m128i __key_hi, void* __h);

.. admonition:: Intel Description

    Wrap a 256-bit AES key from "__key_hi" and "__key_lo" into a 512-bit key stored in "__h" and set IWKey's NoBackup and KeySource bits in "dst". The 32-bit "__htype" specifies __h restrictions.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        __h[511:0] := WrapKey256(__key_lo[127:0], __key_hi[127:0], __htype)
        dst[0] := IWKey.NoBackup
        dst[4:1] := IWKey.KeySource[3:0]
        		

_mm_loadiwkey
^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    unsigned int __ctl, 
    __m128i __intkey, 
    __m128i __enkey_lo, 
    __m128i __enkey_hi
:Param ETypes:
    UI32 __ctl, 
    UI8 __intkey, 
    UI8 __enkey_lo, 
    UI8 __enkey_hi

.. code-block:: C

    void _mm_loadiwkey(unsigned int __ctl, __m128i __intkey,
                       __m128i __enkey_lo, __m128i __enkey_hi)

.. admonition:: Intel Description

    Load internal wrapping key (IWKey). The 32-bit unsigned integer "__ctl" specifies IWKey's KeySource and whether backing up the key is permitted. IWKey's 256-bit encryption key is loaded from "__enkey_lo" and "__enkey_hi". IWKey's 128-bit integrity key is loaded from "__intkey".

_mm_aesdecwide128kl_u8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    const __m128i* __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesdecwide128kl_u8(__m128i* __odata, const __m128i* __idata, const void* __h);

.. admonition:: Intel Description

    Decrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	__odata[i] := AES128Decrypt (__idata[i], __h[383:0])
        ENDFOR
        dst := ZF
        		

_mm_aesdecwide256kl_u8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    const __m128i* __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesdecwide256kl_u8(__m128i* __odata, const __m128i* __idata, const void* __h);

.. admonition:: Intel Description

    Decrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	__odata[i] := AES256Decrypt (__idata[i], __h[511:0])
        ENDFOR
        dst := ZF
        		

_mm_aesencwide128kl_u8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    const __m128i* __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesencwide128kl_u8(__m128i* __odata, const __m128i* __idata, const void* __h);

.. admonition:: Intel Description

    Encrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	__odata[i] := AES128Encrypt (__idata[i], __h[383:0])
        ENDFOR
        dst := ZF
        		

_mm_aesencwide256kl_u8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i* __odata, 
    const __m128i* __idata, 
    const void* __h
:Param ETypes:
    UI8 __odata, 
    UI8 __idata, 
    UI8 __h

.. code-block:: C

    unsigned char _mm_aesencwide256kl_u8(__m128i* __odata, const __m128i* __idata, const void* __h);

.. admonition:: Intel Description

    Encrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	__odata[i] := AES256Encrypt (__idata[i], __h[512:0])
        ENDFOR
        dst := ZF
        		

_mm_sha1msg1_epu32
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sha1msg1_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Perform an intermediate calculation for the next four SHA1 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        W0 := a[127:96]
        W1 := a[95:64]
        W2 := a[63:32]
        W3 := a[31:0]
        W4 := b[127:96]
        W5 := b[95:64]
        dst[127:96] := W2 XOR W0
        dst[95:64] := W3 XOR W1
        dst[63:32] := W4 XOR W2
        dst[31:0] := W5 XOR W3
        	

_mm_sha1msg2_epu32
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sha1msg2_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Perform the final calculation for the next four SHA1 message values (unsigned 32-bit integers) using the intermediate result in "a" and the previous message values in "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        W13 := b[95:64]
        W14 := b[63:32]
        W15 := b[31:0]
        W16 := (a[127:96] XOR W13) <<< 1
        W17 := (a[95:64] XOR W14) <<< 1
        W18 := (a[63:32] XOR W15) <<< 1
        W19 := (a[31:0] XOR W16) <<< 1
        dst[127:96] := W16
        dst[95:64] := W17
        dst[63:32] := W18
        dst[31:0] := W19
        	

_mm_sha1nexte_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sha1nexte_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Calculate SHA1 state variable E after four rounds of operation from the current SHA1 state variable "a", add that value to the scheduled values (unsigned 32-bit integers) in "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := (a[127:96] <<< 30)
        dst[127:96] := b[127:96] + tmp
        dst[95:64] := b[95:64]
        dst[63:32] := b[63:32]
        dst[31:0] := b[31:0]
        	

_mm_sha1rnds4_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int func
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM func

.. code-block:: C

    __m128i _mm_sha1rnds4_epu32(__m128i a, __m128i b,
                                const int func)

.. admonition:: Intel Description

    Perform four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D) from "a" and some pre-computed sum of the next 4 round message values (unsigned 32-bit integers), and state variable E from "b", and store the updated SHA1 state (A,B,C,D) in "dst". "func" contains the logic functions and round constants.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF (func[1:0] == 0)
        	f := f0()
        	K := K0
        ELSE IF (func[1:0] == 1)
        	f := f1()
        	K := K1
        ELSE IF (func[1:0] == 2)
        	f := f2()
        	K := K2
        ELSE IF (func[1:0] == 3)
        	f := f3()
        	K := K3
        FI
        A := a[127:96]
        B := a[95:64]
        C := a[63:32]
        D := a[31:0]
        W[0] := b[127:96]
        W[1] := b[95:64]
        W[2] := b[63:32]
        W[3] := b[31:0]
        A[1] := f(B, C, D) + (A <<< 5) + W[0] + K
        B[1] := A
        C[1] := B <<< 30
        D[1] := C
        E[1] := D
        FOR i := 1 to 3
        	A[i+1] := f(B[i], C[i], D[i]) + (A[i] <<< 5) + W[i] + E[i] + K
        	B[i+1] := A[i]
        	C[i+1] := B[i] <<< 30
        	D[i+1] := C[i]
        	E[i+1] := D[i]
        ENDFOR
        dst[127:96] := A[4]
        dst[95:64] := B[4]
        dst[63:32] := C[4]
        dst[31:0] := D[4]
        	

_mm_sha256msg1_epu32
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sha256msg1_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Perform an intermediate calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        W4 := b[31:0]
        W3 := a[127:96]
        W2 := a[95:64]
        W1 := a[63:32]
        W0 := a[31:0]
        dst[127:96] := W3 + sigma0(W4)
        dst[95:64] := W2 + sigma0(W3)
        dst[63:32] := W1 + sigma0(W2)
        dst[31:0] := W0 + sigma0(W1)
        	

_mm_sha256msg2_epu32
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_sha256msg2_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Perform the final calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst"."

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        W14 := b[95:64]
        W15 := b[127:96]
        W16 := a[31:0] + sigma1(W14)
        W17 := a[63:32] + sigma1(W15)
        W18 := a[95:64] + sigma1(W16)
        W19 := a[127:96] + sigma1(W17)
        dst[127:96] := W19
        dst[95:64] := W18
        dst[63:32] := W17
        dst[31:0] := W16
        	

_mm_sha256rnds2_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Cryptography
:Header: immintrin.h
:Searchable: Other-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i k
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 k

.. code-block:: C

    __m128i _mm_sha256rnds2_epu32(__m128i a, __m128i b,
                                  __m128i k)

.. admonition:: Intel Description

    Perform 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from "a", an initial SHA256 state (A,B,E,F) from "b", and a pre-computed sum of the next 2 round message values (unsigned 32-bit integers) and the corresponding round constants from "k", and store the updated SHA256 state (A,B,E,F) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        A[0] := b[127:96]
        B[0] := b[95:64]
        C[0] := a[127:96]
        D[0] := a[95:64]
        E[0] := b[63:32]
        F[0] := b[31:0]
        G[0] := a[63:32]
        H[0] := a[31:0]
        W_K[0] := k[31:0]
        W_K[1] := k[63:32]
        FOR i := 0 to 1
        	A[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + Maj(A[i], B[i], C[i]) + sum0(A[i])
        	B[i+1] := A[i]
        	C[i+1] := B[i]
        	D[i+1] := C[i]
        	E[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + D[i]
        	F[i+1] := E[i]
        	G[i+1] := F[i]
        	H[i+1] := G[i]
        ENDFOR
        dst[127:96] := A[2]
        dst[95:64] := B[2]
        dst[63:32] := E[2]
        dst[31:0] := F[2]
        	

Shift
-----
Other
~~~~~
_lrotl
^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned long
:Param Types:
    unsigned long a, 
    int shift
:Param ETypes:
    UI32 a, 
    IMM shift

.. code-block:: C

    unsigned long _lrotl(unsigned long a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned long integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        // size := 32 or 64
        dst := a
        count := shift AND (size - 1)
        DO WHILE (count > 0)
        	tmp[0] := dst[size - 1]
        	dst := (dst << 1) OR tmp[0]
        	count := count - 1
        OD
        	

_lrotr
^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned long
:Param Types:
    unsigned long a, 
    int shift
:Param ETypes:
    UI32 a, 
    IMM shift

.. code-block:: C

    unsigned long _lrotr(unsigned long a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned long integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        // size := 32 or 64
        dst := a
        count := shift AND (size - 1)
        DO WHILE (count > 0)
        	tmp[size - 1] := dst[0]
        	dst := (dst >> 1) OR tmp[size - 1]
        	count := count - 1
        OD
        	

_rotl
^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    int shift
:Param ETypes:
    UI32 a, 
    IMM shift

.. code-block:: C

    unsigned int _rotl(unsigned int a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 32-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 31
        DO WHILE (count > 0)
        	tmp[0] := dst[31]
        	dst := (dst << 1) OR tmp[0]
        	count := count - 1
        OD
        	

_rotr
^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    int shift
:Param ETypes:
    UI32 a, 
    IMM shift

.. code-block:: C

    unsigned int _rotr(unsigned int a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 32-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 31
        DO WHILE (count > 0)
        	tmp[31] := dst[0]
        	dst := (dst >> 1) OR tmp
        	count := count - 1
        OD
        	

_rotwl
^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned short
:Param Types:
    unsigned short a, 
    int shift
:Param ETypes:
    UI16 a, 
    IMM shift

.. code-block:: C

    unsigned short _rotwl(unsigned short a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 16-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 15
        DO WHILE (count > 0)
        	tmp[0] := dst[15]
        	dst := (dst << 1) OR tmp[0]
        	count := count - 1
        OD
        	

_rotwr
^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned short
:Param Types:
    unsigned short a, 
    int shift
:Param ETypes:
    UI16 a, 
    IMM shift

.. code-block:: C

    unsigned short _rotwr(unsigned short a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 16-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 15
        DO WHILE (count > 0)
        	tmp[15] := dst[0]
        	dst := (dst >> 1) OR tmp
        	count := count - 1
        OD
        	

_rotl64
^^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    int shift
:Param ETypes:
    UI64 a, 
    IMM shift

.. code-block:: C

    unsigned __int64 _rotl64(unsigned __int64 a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 64-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 63
        DO WHILE (count > 0)
        	tmp[0] := dst[63]
        	dst := (dst << 1) OR tmp[0]
        	count := count - 1
        OD
        	

_rotr64
^^^^^^^
:Tech: Other
:Category: Shift
:Header: immintrin.h
:Searchable: Other-Shift-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    int shift
:Param ETypes:
    UI64 a, 
    IMM shift

.. code-block:: C

    unsigned __int64 _rotr64(unsigned __int64 a, int shift);

.. admonition:: Intel Description

    Shift the bits of unsigned 64-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := a
        count := shift AND 63
        DO WHILE (count > 0)
        	tmp[63] := dst[0]
        	dst := (dst >> 1) OR tmp[63]
        	count := count - 1
        OD
        	

Bit Manipulation
----------------
XMM
~~~
_mm_tzcnt_32
^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm_tzcnt_32(unsigned int a);

.. admonition:: Intel Description

    Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        dst := 0
        DO WHILE ((tmp < 32) AND a[tmp] == 0)
        	tmp := tmp + 1
        	dst := dst + 1
        OD
        	

_mm_tzcnt_64
^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm_tzcnt_64(unsigned __int64 a);

.. admonition:: Intel Description

    Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        dst := 0
        DO WHILE ((tmp < 64) AND a[tmp] == 0)
        	tmp := tmp + 1
        	dst := dst + 1
        OD
        	

_mm_popcnt_u32
^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm_popcnt_u32(unsigned int a);

.. admonition:: Intel Description

    Count the number of bits set to 1 in unsigned 32-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := 0
        FOR i := 0 to 31
        	IF a[i]
        		dst := dst + 1
        	FI
        ENDFOR
        	

_mm_popcnt_u64
^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm_popcnt_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Count the number of bits set to 1 in unsigned 64-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := 0
        FOR i := 0 to 63
        	IF a[i]
        		dst := dst + 1
        	FI
        ENDFOR
        	

Other
~~~~~
_bextr_u32
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int start, 
    unsigned int len
:Param ETypes:
    UI32 a, 
    UI32 start, 
    UI32 len

.. code-block:: C

    unsigned int _bextr_u32(unsigned int a, unsigned int start, unsigned int len);

.. admonition:: Intel Description

    Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a
        dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
        	

_bextr2_u32
^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int control
:Param ETypes:
    UI32 a, 
    UI32 control

.. code-block:: C

    unsigned int _bextr2_u32(unsigned int a, unsigned int control);

.. admonition:: Intel Description

    Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        start := control[7:0]
        len := control[15:8]
        tmp[511:0] := a
        dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
        	

_bextr_u64
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned int start, 
    unsigned int len
:Param ETypes:
    UI64 a, 
    UI32 start, 
    UI32 len

.. code-block:: C

    unsigned __int64 _bextr_u64(unsigned __int64 a, unsigned int start, unsigned int len);

.. admonition:: Intel Description

    Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a
        dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
        	

_bextr2_u64
^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned __int64 control
:Param ETypes:
    UI64 a, 
    UI64 control

.. code-block:: C

    unsigned __int64 _bextr2_u64(unsigned __int64 a, unsigned __int64 control);

.. admonition:: Intel Description

    Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control"..

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        start := control[7:0]
        len := control[15:8]
        tmp[511:0] := a
        dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
        	

_blsi_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _blsi_u32(unsigned int a);

.. admonition:: Intel Description

    Extract the lowest set bit from unsigned 32-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (-a) AND a
        	

_blsi_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _blsi_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Extract the lowest set bit from unsigned 64-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (-a) AND a
        	

_blsmsk_u32
^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _blsmsk_u32(unsigned int a);

.. admonition:: Intel Description

    Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 32-bit integer "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (a - 1) XOR a
        	

_blsmsk_u64
^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _blsmsk_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 64-bit integer "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (a - 1) XOR a
        	

_blsr_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _blsr_u32(unsigned int a);

.. admonition:: Intel Description

    Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (a - 1) AND a
        	

_blsr_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _blsr_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := (a - 1) AND a
        	

_andn_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    unsigned int _andn_u32(unsigned int a, unsigned int b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 32-bit integer "a" and then AND with b, and store the results in dst.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := ((NOT a[31:0]) AND b[31:0])
        	

_andn_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned __int64 b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    unsigned __int64 _andn_u64(unsigned __int64 a, unsigned __int64 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64-bit integer "a" and then AND with b, and store the results in dst.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := ((NOT a[63:0]) AND b[63:0])
        	

_tzcnt_u16
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned short
:Param Types:
    unsigned short a
:Param ETypes:
    UI16 a

.. code-block:: C

    unsigned short _tzcnt_u16(unsigned short a);

.. admonition:: Intel Description

    Count the number of trailing zero bits in unsigned 16-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        dst := 0
        DO WHILE ((tmp < 16) AND a[tmp] == 0)
        	tmp := tmp + 1
        	dst := dst + 1
        OD
        	

_tzcnt_u32
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _tzcnt_u32(unsigned int a);

.. admonition:: Intel Description

    Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        dst := 0
        DO WHILE ((tmp < 32) AND a[tmp] == 0)
        	tmp := tmp + 1
        	dst := dst + 1
        OD
        	

_tzcnt_u64
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _tzcnt_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        dst := 0
        DO WHILE ((tmp < 64) AND a[tmp] == 0)
        	tmp := tmp + 1
        	dst := dst + 1
        OD
        	

_bzhi_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int index
:Param ETypes:
    UI32 a, 
    UI32 index

.. code-block:: C

    unsigned int _bzhi_u32(unsigned int a, unsigned int index);

.. admonition:: Intel Description

    Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        n := index[7:0]
        dst := a
        IF (n < 32)
        	dst[31:n] := 0
        FI
        	

_bzhi_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned int index
:Param ETypes:
    UI64 a, 
    UI32 index

.. code-block:: C

    unsigned __int64 _bzhi_u64(unsigned __int64 a, unsigned int index);

.. admonition:: Intel Description

    Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        n := index[7:0]
        dst := a
        IF (n < 64)
        	dst[63:n] := 0
        FI
        	

_pdep_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int mask
:Param ETypes:
    UI32 a, 
    UI32 mask

.. code-block:: C

    unsigned int _pdep_u32(unsigned int a, unsigned int mask);

.. admonition:: Intel Description

    Deposit contiguous low bits from unsigned 32-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        dst := 0
        m := 0
        k := 0
        DO WHILE m < 32
        	IF mask[m] == 1
        		dst[m] := tmp[k]
        		k := k + 1
        	FI
        	m := m + 1
        OD
        	

_pdep_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned __int64 mask
:Param ETypes:
    UI64 a, 
    UI64 mask

.. code-block:: C

    unsigned __int64 _pdep_u64(unsigned __int64 a, unsigned __int64 mask);

.. admonition:: Intel Description

    Deposit contiguous low bits from unsigned 64-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        dst := 0
        m := 0
        k := 0
        DO WHILE m < 64
        	IF mask[m] == 1
        		dst[m] := tmp[k]
        		k := k + 1
        	FI
        	m := m + 1
        OD
        	

_pext_u32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int mask
:Param ETypes:
    UI32 a, 
    UI32 mask

.. code-block:: C

    unsigned int _pext_u32(unsigned int a, unsigned int mask);

.. admonition:: Intel Description

    Extract bits from unsigned 32-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        dst := 0
        m := 0
        k := 0
        DO WHILE m < 32
        	IF mask[m] == 1
        		dst[k] := tmp[m]
        		k := k + 1
        	FI
        	m := m + 1
        OD
        	

_pext_u64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned __int64 mask
:Param ETypes:
    UI64 a, 
    UI64 mask

.. code-block:: C

    unsigned __int64 _pext_u64(unsigned __int64 a, unsigned __int64 mask);

.. admonition:: Intel Description

    Extract bits from unsigned 64-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        dst := 0
        m := 0
        k := 0
        DO WHILE m < 64
        	IF mask[m] == 1
        		dst[k] := tmp[m]
        		k := k + 1
        	FI
        	m := m + 1
        OD
        	

_lzcnt_u32
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _lzcnt_u32(unsigned int a);

.. admonition:: Intel Description

    Count the number of leading zero bits in unsigned 32-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 31
        dst := 0
        DO WHILE (tmp >= 0 AND a[tmp] == 0)
        	tmp := tmp - 1
        	dst := dst + 1
        OD
        	

_lzcnt_u64
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _lzcnt_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Count the number of leading zero bits in unsigned 64-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 63
        dst := 0
        DO WHILE (tmp >= 0 AND a[tmp] == 0)
        	tmp := tmp - 1
        	dst := dst + 1
        OD
        	

_bit_scan_forward
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: int
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _bit_scan_forward(int a);

.. admonition:: Intel Description

    Set "dst" to the index of the lowest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        IF a == 0
        	// dst is undefined
        ELSE
        	DO WHILE ((tmp < 32) AND a[tmp] == 0)
        		tmp := tmp + 1
        	OD
        FI
        dst := tmp
        	

_bit_scan_reverse
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: int
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _bit_scan_reverse(int a);

.. admonition:: Intel Description

    Set "dst" to the index of the highest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 31
        IF a == 0
        	// dst is undefined
        ELSE
        	DO WHILE ((tmp > 0) AND a[tmp] == 0)
        		tmp := tmp - 1
        	OD
        FI
        dst := tmp
        	

_BitScanForward
^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    unsigned __int32* index, 
    unsigned __int32 a
:Param ETypes:
    UI32 index, 
    UI32 a

.. code-block:: C

    unsigned char _BitScanForward(unsigned __int32* index, unsigned __int32 a);

.. admonition:: Intel Description

    Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        IF a == 0
        	// MEM[index+31:index] is undefined
        	dst := 0
        ELSE
        	DO WHILE ((tmp < 32) AND a[tmp] == 0)
        		tmp := tmp + 1
        	OD
        	MEM[index+31:index] := tmp
        	dst := (tmp == 31) ? 0 : 1
        FI
        	

_BitScanReverse
^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    unsigned __int32* index, 
    unsigned __int32 a
:Param ETypes:
    UI32 index, 
    UI32 a

.. code-block:: C

    unsigned char _BitScanReverse(unsigned __int32* index, unsigned __int32 a);

.. admonition:: Intel Description

    Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 31
        IF a == 0
        	// MEM[index+31:index] is undefined
        	dst := 0
        ELSE
        	DO WHILE ((tmp > 0) AND a[tmp] == 0)
        		tmp := tmp - 1
        	OD
        	MEM[index+31:index] := tmp
        	dst := (tmp == 0) ? 0 : 1
        FI
        	

_BitScanForward64
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    unsigned __int32* index, 
    unsigned __int64 a
:Param ETypes:
    UI32 index, 
    UI64 a

.. code-block:: C

    unsigned char _BitScanForward64(unsigned __int32* index, unsigned __int64 a);

.. admonition:: Intel Description

    Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 0
        IF a == 0
        	// MEM[index+31:index] is undefined
        	dst := 0
        ELSE
        	DO WHILE ((tmp < 64) AND a[tmp] == 0)
        		tmp := tmp + 1
        	OD
        	MEM[index+31:index] := tmp
        	dst := (tmp == 63) ? 0 : 1
        FI
        	

_BitScanReverse64
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    unsigned __int32* index, 
    unsigned __int64 a
:Param ETypes:
    UI32 index, 
    UI64 a

.. code-block:: C

    unsigned char _BitScanReverse64(unsigned __int32* index, unsigned __int64 a);

.. admonition:: Intel Description

    Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := 63
        IF a == 0
        	// MEM[index+31:index] is undefined
        	dst := 0
        ELSE
        	DO WHILE ((tmp > 0) AND a[tmp] == 0)
        		tmp := tmp - 1
        	OD
        	MEM[index+31:index] := tmp
        	dst := (tmp == 0) ? 0 : 1
        FI
        	

_bittest
^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int32* a, 
    __int32 b
:Param ETypes:
    UI32 a, 
    IMM b

.. code-block:: C

    unsigned char _bittest(__int32* a, __int32 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 32-bit integer "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + ZeroExtend64(b)
        dst[0] := MEM[addr]
        	

_bittestandcomplement
^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int32* a, 
    __int32 b
:Param ETypes:
    UI32 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandcomplement(__int32* a, __int32 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 32-bit integer "a", and set that bit to its complement.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + ZeroExtend64(b)
        dst[0] := MEM[addr]
        MEM[addr] := ~dst[0]
        	

_bittestandreset
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int32* a, 
    __int32 b
:Param ETypes:
    UI32 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandreset(__int32* a, __int32 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 32-bit integer "a", and set that bit to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + ZeroExtend64(b)
        dst[0] := MEM[addr]
        MEM[addr] := 0
        	

_bittestandset
^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int32* a, 
    __int32 b
:Param ETypes:
    UI32 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandset(__int32* a, __int32 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 32-bit integer "a", and set that bit to one.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + ZeroExtend64(b)
        dst[0] := MEM[addr]
        MEM[addr] := 1
        	

_bittest64
^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int64* a, 
    __int64 b
:Param ETypes:
    UI64 a, 
    IMM b

.. code-block:: C

    unsigned char _bittest64(__int64* a, __int64 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 64-bit integer "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + b
        dst[0] := MEM[addr]
        	

_bittestandcomplement64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int64* a, 
    __int64 b
:Param ETypes:
    UI64 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandcomplement64(__int64* a, __int64 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 64-bit integer "a", and set that bit to its complement.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + b
        dst[0] := MEM[addr]
        MEM[addr] := ~dst[0]
        	

_bittestandreset64
^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int64* a, 
    __int64 b
:Param ETypes:
    UI64 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandreset64(__int64* a, __int64 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 64-bit integer "a", and set that bit to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + b
        dst[0] := MEM[addr]
        MEM[addr] := 0
        	

_bittestandset64
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: unsigned char
:Param Types:
    __int64* a, 
    __int64 b
:Param ETypes:
    UI64 a, 
    IMM b

.. code-block:: C

    unsigned char _bittestandset64(__int64* a, __int64 b);

.. admonition:: Intel Description

    Return the bit at index "b" of 64-bit integer "a", and set that bit to one.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := a + b
        dst[0] := MEM[addr]
        MEM[addr] := 1
        	

_bswap
^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: int
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _bswap(int a);

.. admonition:: Intel Description

    Reverse the byte order of 32-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := a[31:24]
        dst[15:8] := a[23:16]
        dst[23:16] := a[15:8]
        dst[31:24] := a[7:0]
        	

_bswap64
^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: __int64
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _bswap64(__int64 a);

.. admonition:: Intel Description

    Reverse the byte order of 64-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := a[63:56]
        dst[15:8] := a[55:48]
        dst[23:16] := a[47:40]
        dst[31:24] := a[39:32]
        dst[39:32] := a[31:24]
        dst[47:40] := a[23:16]
        dst[55:48] := a[15:8]
        dst[63:56] := a[7:0]
        	

_popcnt32
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: int
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _popcnt32(int a);

.. admonition:: Intel Description

    Count the number of bits set to 1 in 32-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := 0
        FOR i := 0 to 31
        	IF a[i]
        		dst := dst + 1
        	FI
        ENDFOR
        	

_popcnt64
^^^^^^^^^
:Tech: Other
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: Other-Bit Manipulation-Other
:Return Type: int
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    int _popcnt64(__int64 a);

.. admonition:: Intel Description

    Count the number of bits set to 1 in 64-bit integer "a", and return that count in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := 0
        FOR i := 0 to 63
        	IF a[i]
        		dst := dst + 1
        	FI
        ENDFOR
        	

Cast
----
Other
~~~~~
_castf32_u32
^^^^^^^^^^^^
:Tech: Other
:Category: Cast
:Header: immintrin.h
:Searchable: Other-Cast-Other
:Return Type: unsigned __int32
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    unsigned __int32 _castf32_u32(float a);

.. admonition:: Intel Description

    Cast from type float to type unsigned __int32 without conversion.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_castf64_u64
^^^^^^^^^^^^
:Tech: Other
:Category: Cast
:Header: immintrin.h
:Searchable: Other-Cast-Other
:Return Type: unsigned __int64
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    unsigned __int64 _castf64_u64(double a);

.. admonition:: Intel Description

    Cast from type double to type unsigned __int64 without conversion.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_castu32_f32
^^^^^^^^^^^^
:Tech: Other
:Category: Cast
:Header: immintrin.h
:Searchable: Other-Cast-Other
:Return Type: float
:Param Types:
    unsigned __int32 a
:Param ETypes:
    UI32 a

.. code-block:: C

    float _castu32_f32(unsigned __int32 a);

.. admonition:: Intel Description

    Cast from type unsigned __int32 to type float without conversion.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_castu64_f64
^^^^^^^^^^^^
:Tech: Other
:Category: Cast
:Header: immintrin.h
:Searchable: Other-Cast-Other
:Return Type: double
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    double _castu64_f64(unsigned __int64 a);

.. admonition:: Intel Description

    Cast from type unsigned __int64 to type double without conversion.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

General Support
---------------
XMM
~~~
_mm_clflushopt
^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_clflushopt(void const * p);

.. admonition:: Intel Description

    Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.

_mm_clwb
^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_clwb(void const * p);

.. admonition:: Intel Description

    Write back to memory the cache line that contains "p" from any level of the cache hierarchy in the cache coherence domain.

_mm_monitor
^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: pmmintrin.h
:Searchable: Other-General Support-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void const* p, 
    unsigned extensions, 
    unsigned hints
:Param ETypes:
     p, 
    UI32 extensions, 
    UI32 hints

.. code-block:: C

    void _mm_monitor(void const* p, unsigned extensions,
                     unsigned hints)

.. admonition:: Intel Description

    Arm address monitoring hardware using the address specified in "p". A store to an address within the specified address range triggers the monitoring hardware. Specify optional extensions in "extensions", and optional hints in "hints".

_mm_mwait
^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: pmmintrin.h
:Searchable: Other-General Support-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    unsigned extensions, 
    unsigned hints
:Param ETypes:
    UI32 extensions, 
    UI32 hints

.. code-block:: C

    void _mm_mwait(unsigned extensions, unsigned hints);

.. admonition:: Intel Description

    Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR.

_mm_prefetch
^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    char const* p, 
    int i
:Param ETypes:
    UI8 p, 
    IMM i

.. code-block:: C

    void _mm_prefetch(char const* p, int i);

.. admonition:: Intel Description

    Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i", which can be one of:<ul>
        <li>_MM_HINT_ET0  // 7, move data using the ET0 hint. The PREFETCHW instruction will be generated.</li>
        <li>_MM_HINT_T0   // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.</li>
        <li>_MM_HINT_T1   // 2, move data using the T1 hint. The PREFETCHT1 instruction will be generated.</li>
        <li>_MM_HINT_T2   // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.</li>
        <li>_MM_HINT_NTA  // 0, move data using the non-temporal access (NTA) hint. The PREFETCHNTA instruction will be generated.</li>
    

Other
~~~~~
_readfsbase_u32
^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Read the FS segment base register and store the 32-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := FS_Segment_Base_Register
        dst[63:32] := 0
        	

_readfsbase_u64
^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned __int64

.. code-block:: C

    

.. admonition:: Intel Description

    Read the FS segment base register and store the 64-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := FS_Segment_Base_Register
        	

_readgsbase_u32
^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    

.. admonition:: Intel Description

    Read the GS segment base register and store the 32-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := GS_Segment_Base_Register
        dst[63:32] := 0
        	

_readgsbase_u64
^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned __int64

.. code-block:: C

    

.. admonition:: Intel Description

    Read the GS segment base register and store the 64-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := GS_Segment_Base_Register
        	

_writefsbase_u32
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _writefsbase_u32(unsigned int a);

.. admonition:: Intel Description

    Write the unsigned 32-bit integer "a" to the FS segment base register.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FS_Segment_Base_Register[31:0] := a[31:0]
        FS_Segment_Base_Register[63:32] := 0
        	

_writefsbase_u64
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    void _writefsbase_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Write the unsigned 64-bit integer "a" to the FS segment base register.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FS_Segment_Base_Register[63:0] := a[63:0]
        	

_writegsbase_u32
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _writegsbase_u32(unsigned int a);

.. admonition:: Intel Description

    Write the unsigned 32-bit integer "a" to the GS segment base register.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        GS_Segment_Base_Register[31:0] := a[31:0]
        GS_Segment_Base_Register[63:32] := 0
        	

_writegsbase_u64
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    void _writegsbase_u64(unsigned __int64 a);

.. admonition:: Intel Description

    Write the unsigned 64-bit integer "a" to the GS segment base register.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        GS_Segment_Base_Register[63:0] := a[63:0]
        	

_hreset
^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    int __eax
:Param ETypes:
    SI32 __eax

.. code-block:: C

    void _hreset(int __eax);

.. admonition:: Intel Description

    Provides a hint to the processor to selectively reset the prediction history of the current logical processor specified by a signed 32-bit integer "__eax".

_allow_cpu_features
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned __int64 a
:Param ETypes:
    IMM a

.. code-block:: C

    void _allow_cpu_features(unsigned __int64 a);

.. admonition:: Intel Description

    Treat the processor-specific feature(s) specified in "a" as available. Multiple features may be OR'd together. See the valid feature flags below:

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        _FEATURE_GENERIC_IA32
        _FEATURE_FPU
        _FEATURE_CMOV
        _FEATURE_MMX
        _FEATURE_FXSAVE
        _FEATURE_SSE
        _FEATURE_SSE2
        _FEATURE_SSE3
        _FEATURE_SSSE3
        _FEATURE_SSE4_1
        _FEATURE_SSE4_2
        _FEATURE_MOVBE
        _FEATURE_POPCNT
        _FEATURE_PCLMULQDQ
        _FEATURE_AES
        _FEATURE_F16C
        _FEATURE_AVX
        _FEATURE_RDRND
        _FEATURE_FMA
        _FEATURE_BMI
        _FEATURE_LZCNT
        _FEATURE_HLE
        _FEATURE_RTM
        _FEATURE_AVX2
        _FEATURE_KNCNI
        _FEATURE_AVX512F
        _FEATURE_ADX
        _FEATURE_RDSEED
        _FEATURE_AVX512ER
        _FEATURE_AVX512PF
        _FEATURE_AVX512CD
        _FEATURE_SHA
        _FEATURE_MPX
        _FEATURE_AVX512BW
        _FEATURE_AVX512VL
        _FEATURE_AVX512VBMI
        _FEATURE_AVX512_4FMAPS
        _FEATURE_AVX512_4VNNIW
        _FEATURE_AVX512_VPOPCNTDQ
        _FEATURE_AVX512_BITALG
        _FEATURE_AVX512_VBMI2
        _FEATURE_GFNI
        _FEATURE_VAES
        _FEATURE_VPCLMULQDQ
        _FEATURE_AVX512_VNNI
        _FEATURE_CLWB
        _FEATURE_RDPID
        _FEATURE_IBT
        _FEATURE_SHSTK
        _FEATURE_SGX
        _FEATURE_WBNOINVD
        _FEATURE_PCONFIG
        _FEATURE_AXV512_4VNNIB
        _FEATURE_AXV512_4FMAPH
        _FEATURE_AXV512_BITALG2
        _FEATURE_AXV512_VP2INTERSECT
        	

_may_i_use_cpu_feature
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: int
:Param Types:
    unsigned __int64 a
:Param ETypes:
    IMM a

.. code-block:: C

    int _may_i_use_cpu_feature(unsigned __int64 a);

.. admonition:: Intel Description

    Dynamically query the processor to determine if the processor-specific feature(s) specified in "a" are available, and return true or false (1 or 0) if the set of features is available. Multiple features may be OR'd together. This function is limited to bitmask values in the first 'page' of the libirc cpu-id information. This intrinsic does not check the processor vendor. See the valid feature flags below:

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        _FEATURE_GENERIC_IA32
        _FEATURE_FPU
        _FEATURE_CMOV
        _FEATURE_MMX
        _FEATURE_FXSAVE
        _FEATURE_SSE
        _FEATURE_SSE2
        _FEATURE_SSE3
        _FEATURE_SSSE3
        _FEATURE_SSE4_1
        _FEATURE_SSE4_2
        _FEATURE_MOVBE
        _FEATURE_POPCNT
        _FEATURE_PCLMULQDQ
        _FEATURE_AES
        _FEATURE_F16C
        _FEATURE_AVX
        _FEATURE_RDRND
        _FEATURE_FMA
        _FEATURE_BMI
        _FEATURE_LZCNT
        _FEATURE_HLE
        _FEATURE_RTM
        _FEATURE_AVX2
        _FEATURE_KNCNI
        _FEATURE_AVX512F
        _FEATURE_ADX
        _FEATURE_RDSEED
        _FEATURE_AVX512ER
        _FEATURE_AVX512PF
        _FEATURE_AVX512CD
        _FEATURE_SHA
        _FEATURE_MPX
        _FEATURE_AVX512BW
        _FEATURE_AVX512VL
        _FEATURE_AVX512VBMI
        _FEATURE_AVX512_4FMAPS
        _FEATURE_AVX512_4VNNIW
        _FEATURE_AVX512_VPOPCNTDQ
        _FEATURE_AVX512_BITALG
        _FEATURE_AVX512_VBMI2
        _FEATURE_GFNI
        _FEATURE_VAES
        _FEATURE_VPCLMULQDQ
        _FEATURE_AVX512_VNNI
        _FEATURE_CLWB
        _FEATURE_RDPID
        _FEATURE_IBT
        _FEATURE_SHSTK
        _FEATURE_SGX
        _FEATURE_WBNOINVD
        _FEATURE_PCONFIG
        _FEATURE_AXV512_4VNNIB
        _FEATURE_AXV512_4FMAPH
        _FEATURE_AXV512_BITALG2
        _FEATURE_AXV512_VP2INTERSECT
        _FEATURE_AXV512_FP16
        	

_may_i_use_cpu_feature_ext
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: int
:Param Types:
    unsigned __int64 a, 
    unsigned page
:Param ETypes:
    IMM a, 
    IMM page

.. code-block:: C

    int _may_i_use_cpu_feature_ext(unsigned __int64 a,
                                   unsigned page)

.. admonition:: Intel Description

    Dynamically query the processor to determine if the processor-specific feature(s) specified in "a" are available, and return true or false (1 or 0) if the set of features is available. Multiple features may be OR'd together. This works identically to the previous variant, except it also accepts a 'page' index that permits checking features on the 2nd page of the libirc information. When provided with a '0' in the 'page' parameter, this works identically to _may_i_use_cpu_feature. This intrinsic does not check the processor vendor. See the valid feature flags on the 2nd page below: (provided with a '1' in the 'page' parameter)

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        _FEATURE_CLDEMOTE
        _FEATURE_MOVDIRI
        _FEATURE_MOVDIR64B
        _FEATURE_WAITPKG
        _FEATURE_AVX512_Bf16
        _FEATURE_ENQCMD
        _FEATURE_AVX_VNNI
        _FEATURE_AMX_TILE
        _FEATURE_AMX_INT8
        _FEATURE_AMX_BF16
        _FEATURE_KL
        _FEATURE_WIDE_KL
        _FEATURE_HRESET
        _FEATURE_UINTR
        _FEATURE_PREFETCHI
        _FEATURE_AVXVNNIINT8
        _FEATURE_CMPCCXADD
        _FEATURE_AVXIFMA
        _FEATURE_AVXNECONVERT
        _FEATURE_RAOINT
        _FEATURE_AMX_FP16
        _FEATURE_AMX_COMPLEX
        _FEATURE_SHA512
        _FEATURE_SM3
        _FEATURE_SM4
        _FEATURE_AVXVNNIINT16
        _FEATURE_USERMSR
        _FEATURE_AVX10_1_256
        _FEATURE_AVX10_1_512
        _FEATURE_APXF
        _FEATURE_MSRLIST
        _FEATURE_WRMSRNS
        _FEATURE_PBNDKB
        	

_may_i_use_cpu_feature_str
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: int

.. code-block:: C

    int _may_i_use_cpu_feature_str(string literal feature, ...);

.. admonition:: Intel Description

    Dynamically query the processor to determine if the processor-specific feature(s) specified a series of compile-time string literals in "feature, ..." are available, and return true or false (1 or 0) if the set of features is available. These feature names are converted to a bitmask and uses the same infrastructure as _may_i_use_cpu_feature_ext to validate it. The behavior is the same as the previous variants. This intrinsic does not check the processor vendor. Supported string literals are one-to-one corresponding in the "Operation" sections of _may_i_use_cpu_feature and _may_i_use_cpu_feature_ext. Example string literals are "avx2", "bmi", "avx512fp16", "amx-int8"...

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        	

_rdpmc
^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: __int64
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __int64 _rdpmc(int a);

.. admonition:: Intel Description

    Read the Performance Monitor Counter (PMC) specified by "a", and store up to 64-bits in "dst". The width of performance counters is implementation specific.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := ReadPMC(a)
        	

_rdpid_u32
^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    unsigned int _rdpid_u32(void );

.. admonition:: Intel Description

    Copy the IA32_TSC_AUX MSR (signature value) into "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := IA32_TSC_AUX[31:0]
        	

__rdtscp
^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned int * mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    unsigned __int64 __rdtscp(unsigned int * mem_addr);

.. admonition:: Intel Description

    Copy the current 64-bit value of the processor's time-stamp counter into "dst", and store the IA32_TSC_AUX MSR (signature value) into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := TimeStampCounter
        MEM[mem_addr+31:mem_addr] := IA32_TSC_AUX[31:0]
        	

_xabort
^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    const unsigned int imm8
:Param ETypes:
    IMM imm8

.. code-block:: C

    void _xabort(const unsigned int imm8);

.. admonition:: Intel Description

    Force an RTM abort. The EAX register is updated to reflect an XABORT instruction caused the abort, and the "imm8" parameter will be provided in bits [31:24] of EAX.
    	Following an RTM abort, the logical processor resumes execution at the fallback address computed through the outermost XBEGIN instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF RTM_ACTIVE == 0
        	// nop
        ELSE
        	// restore architectural register state
        	// discard memory updates performed in transaction
        	// update EAX with status and imm8 value
        	eax[31:24] := imm8[7:0]
        	RTM_NEST_COUNT := 0
        	RTM_ACTIVE := 0
        	IF _64_BIT_MODE
        		RIP := fallbackRIP
        	ELSE
        		EIP := fallbackEIP
        	FI
        FI
        	

_xbegin
^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned int

.. code-block:: C

    unsigned int _xbegin(void );

.. admonition:: Intel Description

    Specify the start of an RTM code region. 
    	If the logical processor was not already in transactional execution, then this call causes the logical processor to transition into transactional execution. 
    	On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution, restores architectural state, and starts execution beginning at the fallback address computed from the outermost XBEGIN instruction. Return status of ~0 (0xFFFF) if continuing inside transaction; all other codes are aborts.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF RTM_NEST_COUNT < MAX_RTM_NEST_COUNT
        	RTM_NEST_COUNT := RTM_NEST_COUNT + 1
        	IF RTM_NEST_COUNT == 1
        		IF _64_BIT_MODE
        			fallbackRIP := RIP
        		ELSE IF _32_BIT_MODE
        			fallbackEIP := EIP
        		FI
        		
        		RTM_ACTIVE := 1
        		// enter RTM execution, record register state, start tracking memory state
        	FI
        ELSE
        	// RTM abort (see _xabort)
        FI
        	

_xend
^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void

.. code-block:: C

    void _xend(void );

.. admonition:: Intel Description

    Specify the end of an RTM code region.
    	If this corresponds to the outermost scope, the logical processor will attempt to commit the logical processor state atomically. 
    	If the commit fails, the logical processor will perform an RTM abort.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF RTM_ACTIVE == 1
        	RTM_NEST_COUNT := RTM_NEST_COUNT - 1
        	IF RTM_NEST_COUNT == 0
        		// try to commit transaction
        		IF FAIL_TO_COMMIT_TRANSACTION
        			// RTM abort (see _xabort)
        		ELSE
        			RTM_ACTIVE := 0
        		FI
        	FI
        FI
        	

_xtest
^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned char

.. code-block:: C

    unsigned char _xtest(void );

.. admonition:: Intel Description

    Query the transactional execution status, return 1 if inside a transactionally executing RTM or HLE region, and return 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF (RTM_ACTIVE == 1 OR HLE_ACTIVE == 1)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_serialize
^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void

.. code-block:: C

    

.. admonition:: Intel Description

    Serialize instruction execution, ensuring all modifications to flags, registers, and memory by previous instructions are completed before the next instruction is fetched.

_rdtsc
^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: __int64

.. code-block:: C

    __int64 _rdtsc(void );

.. admonition:: Intel Description

    Copy the current 64-bit value of the processor's time-stamp counter into "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := TimeStampCounter
        	

_clui
^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void

.. code-block:: C

    void _clui(void );

.. admonition:: Intel Description

    Clear the user interrupt flag (UIF).

_senduipi
^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned __int64 __a
:Param ETypes:
    UI64 __a

.. code-block:: C

    void _senduipi(unsigned __int64 __a);

.. admonition:: Intel Description

    Send user interprocessor interrupts specified in unsigned 64-bit integer "__a".

_stui
^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: void

.. code-block:: C

    void _stui(void );

.. admonition:: Intel Description

    Sets the user interrupt flag (UIF).

_testui
^^^^^^^
:Tech: Other
:Category: General Support
:Header: immintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned char

.. code-block:: C

    unsigned char _testui(void );

.. admonition:: Intel Description

    Store the current user interrupt flag (UIF) in unsigned 8-bit integer "dst".

_urdmsr
^^^^^^^
:Tech: Other
:Category: General Support
:Header: x86gprintrin.h
:Searchable: Other-General Support-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 __A
:Param ETypes:
    UI64 __A

.. code-block:: C

    unsigned __int64 _urdmsr(unsigned __int64 __A);

.. admonition:: Intel Description

    Reads the contents of a 64-bit MSR specified in "__A" into "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEST := MSR[__A]
        	

_uwrmsr
^^^^^^^
:Tech: Other
:Category: General Support
:Header: x86gprintrin.h
:Searchable: Other-General Support-Other
:Return Type: void
:Param Types:
    unsigned __int64 __A, 
    unsigned __int64 __B
:Param ETypes:
    UI64 __A, 
    UI64 __B

.. code-block:: C

    void _uwrmsr(unsigned __int64 __A, unsigned __int64 __B);

.. admonition:: Intel Description

    Writes the contents of "__B" into the 64-bit MSR specified in "__A".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MSR[__A] := __B
        	

MMX
~~~
_m_prefetchit0
^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: x86gprintrin.h
:Searchable: Other-General Support-MMX
:Register: MMX 64 bit
:Return Type: void
:Param Types:
    const void* __P
:Param ETypes:
    UI8 __P

.. code-block:: C

    void _m_prefetchit0(const void* __P);

.. admonition:: Intel Description

    Loads an instruction sequence containing the specified memory address into all level cache.

_m_prefetchit1
^^^^^^^^^^^^^^
:Tech: Other
:Category: General Support
:Header: x86gprintrin.h
:Searchable: Other-General Support-MMX
:Register: MMX 64 bit
:Return Type: void
:Param Types:
    const void* __P
:Param ETypes:
    UI8 __P

.. code-block:: C

    void _m_prefetchit1(const void* __P);

.. admonition:: Intel Description

    Loads an instruction sequence containing the specified memory address into all but the first-level cache.

Unknown
-------
Other
~~~~~
_enqcmd
^^^^^^^
:Tech: Other
:Category: Unknown
:Header: immintrin.h
:Searchable: Other-Unknown-Other
:Return Type: int
:Param Types:
    void* __dst, 
    const void* __src
:Param ETypes:
     __dst, 
     __src

.. code-block:: C

    int _enqcmd(void* __dst, const void* __src);

.. admonition:: Intel Description

    Reads 64-byte command pointed by "__src", formats 64-byte enqueue store data, and performs 64-byte enqueue store to memory pointed by "__dst". This intrinsics may only be used in User mode.

_enqcmds
^^^^^^^^
:Tech: Other
:Category: Unknown
:Header: immintrin.h
:Searchable: Other-Unknown-Other
:Return Type: int
:Param Types:
    void* __dst, 
    const void* __src
:Param ETypes:
     __dst, 
     __src

.. code-block:: C

    int _enqcmds(void* __dst, const void* __src);

.. admonition:: Intel Description

    Reads 64-byte command pointed by "__src", formats 64-byte enqueue store data, and performs 64-byte enqueue store to memory pointed by "__dst" This intrinsic may only be used in Privileged mode.

Random
------
Other
~~~~~
_rdrand16_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned short* val
:Param ETypes:
    UI16 val

.. code-block:: C

    int _rdrand16_step(unsigned short* val);

.. admonition:: Intel Description

    Read a hardware generated 16-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_RND_GEN.ready == 1
        	val[15:0] := HW_RND_GEN.data
        	dst := 1
        ELSE
        	val[15:0] := 0
        	dst := 0
        FI
        	

_rdrand32_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned int* val
:Param ETypes:
    UI32 val

.. code-block:: C

    int _rdrand32_step(unsigned int* val);

.. admonition:: Intel Description

    Read a hardware generated 32-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_RND_GEN.ready == 1
        	val[31:0] := HW_RND_GEN.data
        	dst := 1
        ELSE
        	val[31:0] := 0
        	dst := 0
        FI
        	

_rdrand64_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned __int64* val
:Param ETypes:
    UI64 val

.. code-block:: C

    int _rdrand64_step(unsigned __int64* val);

.. admonition:: Intel Description

    Read a hardware generated 64-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_RND_GEN.ready == 1
        	val[63:0] := HW_RND_GEN.data
        	dst := 1
        ELSE
        	val[63:0] := 0
        	dst := 0
        FI
        	

_rdseed16_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned short * val
:Param ETypes:
    UI16 val

.. code-block:: C

    int _rdseed16_step(unsigned short * val);

.. admonition:: Intel Description

    Read a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_NRND_GEN.ready == 1
        	val[15:0] := HW_NRND_GEN.data
        	dst := 1
        ELSE
        	val[15:0] := 0
        	dst := 0
        FI
        	

_rdseed32_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned int * val
:Param ETypes:
    UI32 val

.. code-block:: C

    int _rdseed32_step(unsigned int * val);

.. admonition:: Intel Description

    Read a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_NRND_GEN.ready == 1
        	val[31:0] := HW_NRND_GEN.data
        	dst := 1
        ELSE
        	val[31:0] := 0
        	dst := 0
        FI
        	

_rdseed64_step
^^^^^^^^^^^^^^
:Tech: Other
:Category: Random
:Header: immintrin.h
:Searchable: Other-Random-Other
:Return Type: int
:Param Types:
    unsigned __int64 * val
:Param ETypes:
    UI64 val

.. code-block:: C

    int _rdseed64_step(unsigned __int64 * val);

.. admonition:: Intel Description

    Read a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF HW_NRND_GEN.ready == 1
        	val[63:0] := HW_NRND_GEN.data
        	dst := 1
        ELSE
        	val[63:0] := 0
        	dst := 0
        FI
        	

Store
-----
Other
~~~~~
_storebe_i16
^^^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void * ptr, 
    short data
:Param ETypes:
    UI16 ptr, 
    UI16 data

.. code-block:: C

    void _storebe_i16(void * ptr, short data);

.. admonition:: Intel Description

    Perform a bit swap operation of the 16 bits in "data", and store the results to memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*8
        	MEM[ptr+i+7:ptr+i] := data[15-i:8-i]
        ENDFOR
        	

_storebe_i32
^^^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void * ptr, 
    int data
:Param ETypes:
    UI32 ptr, 
    UI32 data

.. code-block:: C

    void _storebe_i32(void * ptr, int data);

.. admonition:: Intel Description

    Perform a bit swap operation of the 32 bits in "data", and store the results to memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := MEM[ptr]
        FOR j := 0 to 3
        	i := j*8
        	MEM[ptr+i+7:ptr+i] := data[31-i:24-i]
        ENDFOR
        	

_storebe_i64
^^^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void * ptr, 
    __int64 data
:Param ETypes:
    UI64 ptr, 
    UI64 data

.. code-block:: C

    void _storebe_i64(void * ptr, __int64 data);

.. admonition:: Intel Description

    Perform a bit swap operation of the 64 bits in "data", and store the results to memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        addr := MEM[ptr]
        FOR j := 0 to 7
        	i := j*8
        	MEM[ptr+i+7:ptr+i] := data[63-i:56-i]
        ENDFOR
        	

_movdir64b
^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void* dst, 
    const void* src
:Param ETypes:
    M512 dst, 
    M512 src

.. code-block:: C

    void _movdir64b(void* dst, const void* src);

.. admonition:: Intel Description

    Move 64-byte (512-bit) value using direct store from source memory address "src" to destination memory address "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[dst+511:dst] := MEM[src+511:src]
        	

_directstoreu_u64
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void* dst, 
    unsigned __int64 val
:Param ETypes:
    UI64 dst, 
    UI64 val

.. code-block:: C

    void _directstoreu_u64(void* dst, unsigned __int64 val);

.. admonition:: Intel Description

    Store 64-bit integer from "val" into memory using direct store.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[dst+63:dst] := val[63:0]
        	

_directstoreu_u32
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Store
:Header: immintrin.h
:Searchable: Other-Store-Other
:Return Type: void
:Param Types:
    void* dst, 
    unsigned int val
:Param ETypes:
    UI32 dst, 
    UI32 val

.. code-block:: C

    void _directstoreu_u32(void* dst, unsigned int val);

.. admonition:: Intel Description

    Store 32-bit integer from "val" into memory using direct store.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[dst+31:dst] := val[31:0]
        	

Load
----
Other
~~~~~
_loadbe_i16
^^^^^^^^^^^
:Tech: Other
:Category: Load
:Header: immintrin.h
:Searchable: Other-Load-Other
:Return Type: short
:Param Types:
    void const * ptr
:Param ETypes:
    UI16 ptr

.. code-block:: C

    short _loadbe_i16(void const * ptr);

.. admonition:: Intel Description

    Load 16 bits from memory, perform a byte swap operation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*8
        	dst[i+7:i] := MEM[ptr+15-i:ptr+8-i]
        ENDFOR
        	

_loadbe_i32
^^^^^^^^^^^
:Tech: Other
:Category: Load
:Header: immintrin.h
:Searchable: Other-Load-Other
:Return Type: int
:Param Types:
    void const * ptr
:Param ETypes:
    UI32 ptr

.. code-block:: C

    int _loadbe_i32(void const * ptr);

.. admonition:: Intel Description

    Load 32 bits from memory, perform a byte swap operation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*8
        	dst[i+7:i] := MEM[ptr+31-i:ptr+24-i]
        ENDFOR
        	

_loadbe_i64
^^^^^^^^^^^
:Tech: Other
:Category: Load
:Header: immintrin.h
:Searchable: Other-Load-Other
:Return Type: __int64
:Param Types:
    void const * ptr
:Param ETypes:
    UI64 ptr

.. code-block:: C

    __int64 _loadbe_i64(void const * ptr);

.. admonition:: Intel Description

    Load 64 bits from memory, perform a byte swap operation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	dst[i+7:i] := MEM[ptr+63-i:ptr+56-i]
        ENDFOR
        	

Arithmetic
----------
ZMM
~~~
_mm512_maskz_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_gf2p8mul_epi8(__mmask64 k, __m512i a,
                                       __m512i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 63
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_gf2p8mul_epi8(__m512i src, __mmask64 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 63
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := src.byte[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_gf2p8mul_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 63
        	dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_maskz_gf2p8affine_epi64_epi8(__mmask64 k,
                                                __m512i x,
                                                __m512i A,
                                                int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_mask_gf2p8affine_epi64_epi8(
        __m512i src, __mmask64 k, __m512i x, __m512i A, int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[i]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_gf2p8affine_epi64_epi8(__m512i x, __m512i A,
                                          int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_maskz_gf2p8affineinv_epi64_epi8(__mmask64 k,
                                                   __m512i x,
                                                   __m512i A,
                                                   int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_mask_gf2p8affineinv_epi64_epi8(
        __m512i src, __mmask64 k, __m512i x, __m512i A, int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[b]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i x, 
    __m512i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m512i _mm512_gf2p8affineinv_epi64_epi8(__m512i x,
                                             __m512i A, int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 7
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_maskz_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_gf2p8mul_epi8(__mmask32 k, __m256i a,
                                       __m256i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 31
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_gf2p8mul_epi8(__m256i src, __mmask32 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 31
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := src.byte[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_gf2p8mul_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 31
        	dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_maskz_gf2p8affine_epi64_epi8(__mmask32 k,
                                                __m256i x,
                                                __m256i A,
                                                int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_mask_gf2p8affine_epi64_epi8(
        __m256i src, __mmask32 k, __m256i x, __m256i A, int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[i]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_gf2p8affine_epi64_epi8(__m256i x, __m256i A,
                                          int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_maskz_gf2p8affineinv_epi64_epi8(__mmask32 k,
                                                   __m256i x,
                                                   __m256i A,
                                                   int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_mask_gf2p8affineinv_epi64_epi8(
        __m256i src, __mmask32 k, __m256i x, __m256i A, int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[i]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i x, 
    __m256i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m256i _mm256_gf2p8affineinv_epi64_epi8(__m256i x,
                                             __m256i A, int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 3
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_maskz_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_gf2p8mul_epi8(__mmask16 k, __m128i a,
                                    __m128i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 15
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_gf2p8mul_epi8(__m128i src, __mmask16 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 15
        	IF k[j]
        		dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        	ELSE
        		dst.byte[j] := src.byte[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_gf2p8mul_epi8
^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_gf2p8mul_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE gf2p8mul_byte(src1byte, src2byte) {
        	tword := 0
        	FOR i := 0 to 7
        		IF src2byte.bit[i]
        			tword := tword XOR (src1byte << i)
        		FI
        	ENDFOR
        	FOR i := 14 downto 8
        		p := 0x11B << (i-8)
        		IF tword.bit[i]
        			tword := tword XOR p
        		FI
        	ENDFOR
        	RETURN tword.byte[0]
        }
        FOR j := 0 TO 15
        	dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_maskz_gf2p8affine_epi64_epi8(__mmask16 k,
                                             __m128i x,
                                             __m128i A, int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_mask_gf2p8affine_epi64_epi8(__m128i src,
                                            __mmask16 k,
                                            __m128i x,
                                            __m128i A, int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[i]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_gf2p8affine_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_gf2p8affine_epi64_epi8(__m128i x, __m128i A,
                                       int b)

.. admonition:: Intel Description

    Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_maskz_gf2p8affineinv_epi64_epi8(__mmask16 k,
                                                __m128i x,
                                                __m128i A,
                                                int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_mask_gf2p8affineinv_epi64_epi8(
        __m128i src, __mmask16 k, __m128i x, __m128i A, int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		IF k[j*8+i]
        			dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        		ELSE
        			dst.qword[j].byte[i] := src.qword[j].byte[i]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_gf2p8affineinv_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i x, 
    __m128i A, 
    int b
:Param ETypes:
    UI64 x, 
    UI64 A, 
    IMM b

.. code-block:: C

    __m128i _mm_gf2p8affineinv_epi64_epi8(__m128i x, __m128i A,
                                          int b)

.. admonition:: Intel Description

    Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE parity(x) {
        	t := 0
        	FOR i := 0 to 7
        		t := t XOR x.bit[i]
        	ENDFOR
        	RETURN t
        }
        DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
        	FOR i := 0 to 7
        		retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
        	ENDFOR
        	RETURN retbyte
        }
        FOR j := 0 TO 1
        	FOR i := 0 to 7
        		dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

Other
~~~~~
_addcarryx_u32
^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned int a, 
    unsigned int b, 
    unsigned int * out
:Param ETypes:
    UI8 c_in, 
    UI32 a, 
    UI32 b, 
    UI32 out

.. code-block:: C

    unsigned char _addcarryx_u32(unsigned char c_in, unsigned int a, unsigned int b, unsigned int * out);

.. admonition:: Intel Description

    Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0)
        MEM[out+31:out] := tmp[31:0]
        dst[0] := tmp[32]
        dst[7:1] := 0
        	

_addcarryx_u64
^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned __int64 a, 
    unsigned __int64 b, 
    unsigned __int64 * out
:Param ETypes:
    UI8 c_in, 
    UI64 a, 
    UI64 b, 
    UI64 out

.. code-block:: C

    unsigned char _addcarryx_u64(unsigned char c_in, unsigned __int64 a, unsigned __int64 b, unsigned __int64 * out);

.. admonition:: Intel Description

    Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0)
        MEM[out+63:out] := tmp[63:0]
        dst[0] := tmp[64]
        dst[7:1] := 0
        	

_mulx_u32
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned int
:Param Types:
    unsigned int a, 
    unsigned int b, 
    unsigned int* hi
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 hi

.. code-block:: C

    unsigned int _mulx_u32(unsigned int a, unsigned int b, unsigned int* hi);

.. admonition:: Intel Description

    Multiply unsigned 32-bit integers "a" and "b", store the low 32-bits of the result in "dst", and store the high 32-bits in "hi". This does not read or write arithmetic flags.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a * b)[31:0]
        MEM[hi+31:hi] := (a * b)[63:32]
        	

_mulx_u64
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned __int64 a, 
    unsigned __int64 b, 
    unsigned __int64* hi
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 hi

.. code-block:: C

    unsigned __int64 _mulx_u64(unsigned __int64 a, unsigned __int64 b, unsigned __int64* hi);

.. admonition:: Intel Description

    Multiply unsigned 64-bit integers "a" and "b", store the low 64-bits of the result in "dst", and store the high 64-bits in "hi". This does not read or write arithmetic flags.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a * b)[63:0]
        MEM[hi+63:hi]  := (a * b)[127:64]
        	

_cmpccxadd_epi32
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: int
:Param Types:
    void* __A, 
    int __B, 
    int __C, 
    const int __D
:Param ETypes:
    SI32 __A, 
    SI32 __B, 
    SI32 __C, 
    SI32 __D

.. code-block:: C

    int _cmpccxadd_epi32(void* __A, int __B, int __C,
                         const int __D)

.. admonition:: Intel Description

    Compares the value from the memory "__A" with the value of "__B". If the specified condition "__D" is met, then add the third operand "__C" to the "__A" and write it into "__A", else the value of "__A" is unchanged. The return value is the original value of "__A".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (__D[3:0]) OF
        0: OP := _CMPCCX_O
        1: OP := _CMPCCX_NO
        2: OP := _CMPCCX_B
        3: OP := _CMPCCX_NB
        4: OP := _CMPCCX_Z
        5: OP := _CMPCCX_NZ
        6: OP := _CMPCCX_BE
        7: OP := _CMPCCX_NBE
        8: OP := _CMPCCX_S
        9: OP := _CMPCCX_NS
        10: OP := _CMPCCX_P
        11: OP := _CMPCCX_NP
        12: OP := _CMPCCX_L
        13: OP := _CMPCCX_NL
        14: OP := _CMPCCX_LE
        15: OP := _CMPCCX_NLE
        ESAC
        tmp1 := LOAD_LOCK(__A)
        tmp2 := tmp1 + __C
        IF (tmp1[31:0] OP __B[31:0])
        	STORE_UNLOCK(__A, tmp2)
        ELSE
        	STORE_UNLOCK(__A, tmp1)
        FI
        dst[31:0] := tmp1[31:0]
        	

_cmpccxadd_epi64
^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: __int64
:Param Types:
    void* __A, 
    __int64 __B, 
    __int64 __C, 
    const int __D
:Param ETypes:
    SI64 __A, 
    SI64 __B, 
    SI64 __C, 
    SI32 __D

.. code-block:: C

    __int64 _cmpccxadd_epi64(void* __A, __int64 __B,
                             __int64 __C, const int __D)

.. admonition:: Intel Description

    Compares the value from the memory "__A" with the value of "__B". If the specified condition "__D" is met, then add the third operand "__C" to the "__A" and write it into "__A", else the value of "__A" is unchanged. The return value is the original value of "__A".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (__D[3:0]) OF
        0: OP := _CMPCCX_O
        1: OP := _CMPCCX_NO
        2: OP := _CMPCCX_B
        3: OP := _CMPCCX_NB
        4: OP := _CMPCCX_Z
        5: OP := _CMPCCX_NZ
        6: OP := _CMPCCX_BE
        7: OP := _CMPCCX_NBE
        8: OP := _CMPCCX_S
        9: OP := _CMPCCX_NS
        10: OP := _CMPCCX_P
        11: OP := _CMPCCX_NP
        12: OP := _CMPCCX_L
        13: OP := _CMPCCX_NL
        14: OP := _CMPCCX_LE
        15: OP := _CMPCCX_NLE
        ESAC
        tmp1 := LOAD_LOCK(__A)
        tmp2 := tmp1 + __C
        IF (tmp1[63:0] OP __B[63:0])
        	STORE_UNLOCK(__A, tmp2)
        ELSE
        	STORE_UNLOCK(__A, tmp1)
        FI
        dst[63:0] := tmp1[63:0]
        	

_addcarry_u32
^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned int a, 
    unsigned int b, 
    unsigned int * out
:Param ETypes:
    UI8 c_in, 
    UI32 a, 
    UI32 b, 
    UI32 out

.. code-block:: C

    unsigned char _addcarry_u32(unsigned char c_in, unsigned int a, unsigned int b, unsigned int * out);

.. admonition:: Intel Description

    Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0)
        MEM[out+31:out] := tmp[31:0]
        dst[0] := tmp[32]
        dst[7:1] := 0
        	

_addcarry_u64
^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned __int64 a, 
    unsigned __int64 b, 
    unsigned __int64 * out
:Param ETypes:
    UI8 c_in, 
    UI64 a, 
    UI64 b, 
    UI64 out

.. code-block:: C

    unsigned char _addcarry_u64(unsigned char c_in, unsigned __int64 a, unsigned __int64 b, unsigned __int64 * out);

.. admonition:: Intel Description

    Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0)
        MEM[out+63:out] := tmp[63:0]
        dst[0] := tmp[64]
        dst[7:1] := 0
        	

_subborrow_u32
^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned int a, 
    unsigned int b, 
    unsigned int * out
:Param ETypes:
    UI8 c_in, 
    UI32 a, 
    UI32 b, 
    UI32 out

.. code-block:: C

    unsigned char _subborrow_u32(unsigned char c_in, unsigned int a, unsigned int b, unsigned int * out);

.. admonition:: Intel Description

    Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 32-bit integer "b", and subtract the result from unsigned 32-bit integer "a". Store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[32:0] := a[31:0] - (b[31:0] + (c_in > 0 ? 1 : 0))
        MEM[out+31:out] := tmp[31:0]
        dst[0] := tmp[32]
        dst[7:1] := 0
        	

_subborrow_u64
^^^^^^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: immintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: unsigned char
:Param Types:
    unsigned char c_in, 
    unsigned __int64 a, 
    unsigned __int64 b, 
    unsigned __int64 * out
:Param ETypes:
    UI8 c_in, 
    UI64 a, 
    UI64 b, 
    UI64 out

.. code-block:: C

    unsigned char _subborrow_u64(unsigned char c_in, unsigned __int64 a, unsigned __int64 b, unsigned __int64 * out);

.. admonition:: Intel Description

    Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 64-bit integer "b", and subtract the result from unsigned 64-bit integer "a". Store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[64:0] := a[63:0] - (b[63:0] + (c_in > 0 ? 1 : 0))
        MEM[out+63:out] := tmp[63:0]
        dst[0] := tmp[64]
        dst[7:1] := 0
        	

_aadd_i32
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    int* __A, 
    int __B
:Param ETypes:
    SI32 __A, 
    SI32 __B

.. code-block:: C

    void _aadd_i32(int* __A, int __B);

.. admonition:: Intel Description

    Atomically add a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+31:__A] := MEM[__A+31:__A] + __B[31:0]
        

_aadd_i64
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    __int64* __A, 
    __int64 __B
:Param ETypes:
    SI64 __A, 
    SI64 __B

.. code-block:: C

    void _aadd_i64(__int64* __A, __int64 __B);

.. admonition:: Intel Description

    Atomically add a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+63:__A] := MEM[__A+63:__A] + __B[63:0]
        

_aand_i32
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    int* __A, 
    int __B
:Param ETypes:
    SI32 __A, 
    SI32 __B

.. code-block:: C

    void _aand_i32(int* __A, int __B);

.. admonition:: Intel Description

    Atomically and a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+31:__A] := MEM[__A+31:__A] AND __B[31:0]
        

_aand_i64
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    __int64* __A, 
    __int64 __B
:Param ETypes:
    SI64 __A, 
    SI64 __B

.. code-block:: C

    void _aand_i64(__int64* __A, __int64 __B);

.. admonition:: Intel Description

    Atomically and a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+63:__A] := MEM[__A+63:__A] AND __B[63:0]
        

_aor_i32
^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    int* __A, 
    int __B
:Param ETypes:
    SI32 __A, 
    SI32 __B

.. code-block:: C

    void _aor_i32(int* __A, int __B);

.. admonition:: Intel Description

    Atomically or a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+31:__A] := MEM[__A+31:__A] OR __B[31:0]
        

_aor_i64
^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    __int64* __A, 
    __int64 __B
:Param ETypes:
    SI64 __A, 
    SI64 __B

.. code-block:: C

    void _aor_i64(__int64* __A, __int64 __B);

.. admonition:: Intel Description

    Atomically or a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+63:__A] := MEM[__A+63:__A] OR __B[63:0]
        

_axor_i32
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    int* __A, 
    int __B
:Param ETypes:
    SI32 __A, 
    SI32 __B

.. code-block:: C

    void _axor_i32(int* __A, int __B);

.. admonition:: Intel Description

    Atomically xor a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+31:__A] := MEM[__A+31:__A] XOR __B[31:0]
        

_axor_i64
^^^^^^^^^
:Tech: Other
:Category: Arithmetic
:Header: x86gprintrin.h
:Searchable: Other-Arithmetic-Other
:Return Type: void
:Param Types:
    __int64* __A, 
    __int64 __B
:Param ETypes:
    SI64 __A, 
    SI64 __B

.. code-block:: C

    void _axor_i64(__int64* __A, __int64 __B);

.. admonition:: Intel Description

    Atomically xor a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[__A+63:__A] := MEM[__A+63:__A] XOR __B[63:0]
        

Convert
-------
Other
~~~~~
_cvtsh_ss
^^^^^^^^^
:Tech: Other
:Category: Convert
:Header: emmintrin.h
:Searchable: Other-Convert-Other
:Return Type: float
:Param Types:
    unsigned short a
:Param ETypes:
    UI16 a

.. code-block:: C

    float _cvtsh_ss(unsigned short a);

.. admonition:: Intel Description

    Convert the half-precision (16-bit) floating-point value "a" to a single-precision (32-bit) floating-point value, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP16_To_FP32(a[15:0])
        	

_cvtss_sh
^^^^^^^^^
:Tech: Other
:Category: Convert
:Header: emmintrin.h
:Searchable: Other-Convert-Other
:Return Type: unsigned short
:Param Types:
    float a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    unsigned short _cvtss_sh(float a, int rounding);

.. admonition:: Intel Description

    Convert the single-precision (32-bit) floating-point value "a" to a half-precision (16-bit) floating-point value, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Convert_FP32_To_FP16(a[31:0])
        	

OS-Targeted
-----------
Other
~~~~~
_fxrstor
^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void

.. code-block:: C

    void _fxrstor(void * mem_addr);

.. admonition:: Intel Description

    Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        state_x87_fpu_mmx_sse := fxrstor(MEM[mem_addr+512*8:mem_addr])
        	

_fxrstor64
^^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void

.. code-block:: C

    void _fxrstor64(void * mem_addr);

.. admonition:: Intel Description

    Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE64 instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        state_x87_fpu_mmx_sse := fxrstor64(MEM[mem_addr+512*8:mem_addr])
        	

_fxsave
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void

.. code-block:: C

    void _fxsave(void * mem_addr);

.. admonition:: Intel Description

    Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[mem_addr+512*8:mem_addr] := fxsave(state_x87_fpu_mmx_sse)
        	

_fxsave64
^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void

.. code-block:: C

    void _fxsave64(void * mem_addr);

.. admonition:: Intel Description

    Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[mem_addr+512*8:mem_addr] := fxsave64(state_x87_fpu_mmx_sse)
        	

_invpcid
^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    unsigned int type, 
    void* descriptor
:Param ETypes:
    UI32 type, 
     descriptor

.. code-block:: C

    void _invpcid(unsigned int type, void* descriptor);

.. admonition:: Intel Description

    Invalidate mappings in the Translation Lookaside Buffers (TLBs) and paging-structure caches for the processor context identifier (PCID) specified by "descriptor" based on the invalidation type specified in "type". 
    	The PCID "descriptor" is specified as a 16-byte memory operand (with no alignment restrictions) where bits [11:0] specify the PCID, and bits [127:64] specify the linear address; bits [63:12] are reserved.
    	The types supported are:
    		0) Individual-address invalidation: If "type" is 0, the logical processor invalidates mappings for a single linear address and tagged with the PCID specified in "descriptor", except global translations. The instruction may also invalidate global translations, mappings for other linear addresses, or mappings tagged with other PCIDs.
    		1) Single-context invalidation: If "type" is 1, the logical processor invalidates all mappings tagged with the PCID specified in "descriptor" except global translations. In some cases, it may invalidate mappings for other PCIDs as well.
    		2) All-context invalidation: If "type" is 2, the logical processor invalidates all mappings tagged with any PCID.
    		3) All-context invalidation, retaining global translations: If "type" is 3, the logical processor invalidates all mappings tagged with any PCID except global translations, ignoring "descriptor". The instruction may also invalidate global translations as well.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE type[1:0] OF
        0: // individual-address invalidation retaining global translations
        	OP_PCID := MEM[descriptor+11:descriptor]
        	ADDR := MEM[descriptor+127:descriptor+64]
        	BREAK
        1: // single PCID invalidation retaining globals
        	OP_PCID := MEM[descriptor+11:descriptor]
        	// invalidate all mappings tagged with OP_PCID except global translations
        	BREAK
        2: // all PCID invalidation
        	// invalidate all mappings tagged with any PCID
        	BREAK
        3: // all PCID invalidation retaining global translations
        	// invalidate all mappings tagged with any PCID except global translations
        	BREAK
        ESAC
        	

_xsavec
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsavec(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsavec64
^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsavec64(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsaveopt
^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsaveopt(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsaveopt64
^^^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsaveopt64(void* mem_addr,
                     unsigned __int64 save_mask)

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE64 instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsaves
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsaves(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsaves64
^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsaves64(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xrstors
^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    const void * mem_addr, 
    unsigned __int64 rs_mask
:Param ETypes:
     mem_addr, 
    UI64 rs_mask

.. code-block:: C

    void _xrstors(const void* mem_addr,
                  unsigned __int64 rs_mask)

.. admonition:: Intel Description

    Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
        FOR i := 0 to 62
        	IF (rs_mask[i] AND XCR0[i])
        		IF st_mask[i]
        			CASE (i) OF
        			0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
        			1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
        			DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
        			ESAC
        		ELSE
        			// ProcessorExtendedState := Processor Supplied Values
        			CASE (i) OF
        			1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
        			ESAC
        		FI
        	FI
        	i := i + 1
        ENDFOR
        	

_xrstors64
^^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    const void * mem_addr, 
    unsigned __int64 rs_mask
:Param ETypes:
     mem_addr, 
    UI64 rs_mask

.. code-block:: C

    void _xrstors64(const void* mem_addr,
                    unsigned __int64 rs_mask)

.. admonition:: Intel Description

    Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
        FOR i := 0 to 62
        	IF (rs_mask[i] AND XCR0[i])
        		IF st_mask[i]
        			CASE (i) OF
        			0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
        			1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
        			DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
        			ESAC
        		ELSE
        			// ProcessorExtendedState := Processor Supplied Values
        			CASE (i) OF
        			1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
        			ESAC
        		FI
        	FI
        	i := i + 1
        ENDFOR
        	

_xgetbv
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: unsigned __int64
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned __int64 _xgetbv(unsigned int a);

.. admonition:: Intel Description

    Copy up to 64-bits from the value of the extended control register (XCR) specified by "a" into "dst". Currently only XFEATURE_ENABLED_MASK XCR is supported.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := XCR[a]
        	

_xrstor
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 rs_mask
:Param ETypes:
     mem_addr, 
    UI64 rs_mask

.. code-block:: C

    void _xrstor(void * mem_addr, unsigned __int64 rs_mask);

.. admonition:: Intel Description

    Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
        FOR i := 0 to 62
        	IF (rs_mask[i] AND XCR0[i])
        		IF st_mask[i]
        			CASE (i) OF
        			0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
        			1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
        			DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
        			ESAC
        		ELSE
        			// ProcessorExtendedState := Processor Supplied Values
        			CASE (i) OF
        			1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
        			ESAC
        		FI
        	FI
        	i := i + 1
        ENDFOR
        	

_xrstor64
^^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 rs_mask
:Param ETypes:
     mem_addr, 
    UI64 rs_mask

.. code-block:: C

    void _xrstor64(void * mem_addr, unsigned __int64 rs_mask);

.. admonition:: Intel Description

    Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
        FOR i := 0 to 62
        	IF (rs_mask[i] AND XCR0[i])
        		IF st_mask[i]
        			CASE (i) OF
        			0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
        			1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
        			DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
        			ESAC
        		ELSE
        			// ProcessorExtendedState := Processor Supplied Values
        			CASE (i) OF
        			1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
        			ESAC
        		FI
        	FI
        	i := i + 1
        ENDFOR
        	

_xsave
^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsave(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsave64
^^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    void * mem_addr, 
    unsigned __int64 save_mask
:Param ETypes:
     mem_addr, 
    UI64 save_mask

.. code-block:: C

    void _xsave64(void * mem_addr, unsigned __int64 save_mask);

.. admonition:: Intel Description

    Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        mask[62:0] := save_mask[62:0] AND XCR0[62:0]
        FOR i := 0 to 62
        	IF mask[i]
        		CASE (i) OF
        		0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
        		1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
        		DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
        		ESAC
        		mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
        	FI
        	i := i + 1
        ENDFOR
        	

_xsetbv
^^^^^^^
:Tech: Other
:Category: OS-Targeted
:Header: immintrin.h
:Searchable: Other-OS-Targeted-Other
:Return Type: void
:Param Types:
    unsigned int a, 
    unsigned __int64 val
:Param ETypes:
    UI32 a, 
    UI64 val

.. code-block:: C

    void _xsetbv(unsigned int a, unsigned __int64 val);

.. admonition:: Intel Description

    Copy 64-bits from "val" to the extended control register (XCR) specified by "a". Currently only XFEATURE_ENABLED_MASK XCR is supported.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        XCR[a] := val[63:0]
        	

Application-Targeted
--------------------
ZMM
~~~
_mm512_clmulepi64_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: Other-Application-Targeted-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i b, 
    __m512i c, 
    const int Imm8
:Param ETypes:
    M128 b, 
    M128 c, 
    IMM Imm8

.. code-block:: C

    __m512i _mm512_clmulepi64_epi128(__m512i b, __m512i c,
                                     const int Imm8)

.. admonition:: Intel Description

    Carry-less multiplication of one quadword of
    		'b' by one quadword of 'c', stores
    		the 128-bit result in 'dst'. The immediate 'Imm8' is
    		used to determine which quadwords of 'b'
    		and 'c' should be used.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE PCLMUL128(X,Y) {
        	FOR i := 0 to 63
        		TMP[i] := X[ 0 ] and Y[ i ]
        		FOR j := 1 to i
        			TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
        		ENDFOR
        		DEST[ i ] := TMP[ i ]
        	ENDFOR
        	FOR i := 64 to 126
        		TMP[i] := 0
        		FOR j := i - 63 to 63
        			TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
        		ENDFOR
        		DEST[ i ] := TMP[ i ]
        	ENDFOR
        	DEST[127] := 0
        	RETURN DEST // 128b vector
        }
        FOR i := 0 to 3
        	IF Imm8[0] == 0
        		TEMP1 := b.m128[i].qword[0]
        	ELSE
        		TEMP1 := b.m128[i].qword[1]
        	FI
        	IF Imm8[4] == 0
        		TEMP2 := c.m128[i].qword[0]
        	ELSE
        		TEMP2 := c.m128[i].qword[1]
        	FI
        	dst.m128[i] := PCLMUL128(TEMP1, TEMP2)
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_clmulepi64_epi128
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: Other-Application-Targeted-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i b, 
    __m256i c, 
    const int Imm8
:Param ETypes:
    M128 b, 
    M128 c, 
    IMM Imm8

.. code-block:: C

    __m256i _mm256_clmulepi64_epi128(__m256i b, __m256i c,
                                     const int Imm8)

.. admonition:: Intel Description

    Carry-less multiplication of one quadword of
    		'b' by one quadword of 'c', stores
    		the 128-bit result in 'dst'. The immediate 'Imm8' is
    		used to determine which quadwords of 'b'
    		and 'c' should be used.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE PCLMUL128(X,Y) {
        	FOR i := 0 to 63
        		TMP[i] := X[ 0 ] and Y[ i ]
        		FOR j := 1 to i
        			TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
        		ENDFOR
        		DEST[ i ] := TMP[ i ]
        	ENDFOR
        	FOR i := 64 to 126
        		TMP[i] := 0
        		FOR j := i - 63 to 63
        			TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
        		ENDFOR
        		DEST[ i ] := TMP[ i ]
        	ENDFOR
        	DEST[127] := 0
        	RETURN DEST // 128b vector
        }
        FOR i := 0 to 1
        	IF Imm8[0] == 0
        		TEMP1 := b.m128[i].qword[0]
        	ELSE
        		TEMP1 := b.m128[i].qword[1]
        	FI
        	IF Imm8[4] == 0
        		TEMP2 := c.m128[i].qword[0]
        	ELSE
        		TEMP2 := c.m128[i].qword[1]
        	FI
        	dst.m128[i] := PCLMUL128(TEMP1, TEMP2)
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_clmulepi64_si128
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Application-Targeted
:Header: wmmintrin.h
:Searchable: Other-Application-Targeted-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M128 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_clmulepi64_si128(__m128i a, __m128i b,
                                 const int imm8)

.. admonition:: Intel Description

    Perform a carry-less multiplication of two 64-bit integers, selected from "a" and "b" according to "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0)
        	TEMP1 := a[63:0]
        ELSE
        	TEMP1 := a[127:64]
        FI 
        IF (imm8[4] == 0)
        	TEMP2 := b[63:0]
        ELSE 
        	TEMP2 := b[127:64]
        FI
        FOR i := 0 to 63
        	TEMP[i] := (TEMP1[0] and TEMP2[i])
        	FOR j := 1 to i
        		TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j])
        	ENDFOR 
        	dst[i] := TEMP[i]
        ENDFOR
        FOR i := 64 to 127
        	TEMP[i] := 0
        	FOR j := (i - 63) to 63
        		TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j])
        	ENDFOR
        	dst[i] := TEMP[i]
        ENDFOR
        dst[127] := 0
        	

Miscellaneous
-------------
XMM
~~~
_mm_cldemote
^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: void

.. code-block:: C

    void _mm_cldemote(void const * p);

.. admonition:: Intel Description

    Hint to hardware that the cache line that contains "p" should be demoted from the cache closest to the processor core to a level more distant from the processor core.

Other
~~~~~
_incsspd
^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _incsspd(int a);

.. admonition:: Intel Description

    Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        SSP := SSP + a[7:0] * 4
        	

_incsspq
^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _incsspq(int a);

.. admonition:: Intel Description

    Increment the shadow stack pointer by 8 times the value specified in bits [7:0] of "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        SSP := SSP + a[7:0] * 8
        	

_rdsspd_i32
^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: __int32

.. code-block:: C

    __int32 _rdsspd_i32(void );

.. admonition:: Intel Description

    Read the low 32-bits of the current shadow stack pointer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := SSP[31:0]
        	

_rdsspq_i64
^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: __int64

.. code-block:: C

    __int64 _rdsspq_i64(void );

.. admonition:: Intel Description

    Read the current shadow stack pointer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := SSP[63:0]
        	

_saveprevssp
^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _saveprevssp(void );

.. admonition:: Intel Description

    Save the previous shadow stack pointer context.

_rstorssp
^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _rstorssp(void * p);

.. admonition:: Intel Description

    Restore the saved shadow stack pointer from the shadow stack restore token previously created on shadow stack by saveprevssp.

_wrssd
^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    __int32 val, 
    void * p
:Param ETypes:
    UI32 val, 
     p

.. code-block:: C

    void _wrssd(__int32 val, void * p);

.. admonition:: Intel Description

    Write 32-bit value in "val" to a shadow stack page in memory specified by "p".

_wrssq
^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    __int64 val, 
    void * p
:Param ETypes:
    UI64 val, 
     p

.. code-block:: C

    void _wrssq(__int64 val, void * p);

.. admonition:: Intel Description

    Write 64-bit value in "val" to a shadow stack page in memory specified by "p".

_wrussd
^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    __int32 val, 
    void * p
:Param ETypes:
    UI32 val, 
     p

.. code-block:: C

    void _wrussd(__int32 val, void * p);

.. admonition:: Intel Description

    Write 32-bit value in "val" to a user shadow stack page in memory specified by "p".

_wrussq
^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    __int64 val, 
    void * p
:Param ETypes:
    UI64 val, 
     p

.. code-block:: C

    void _wrussq(__int64 val, void * p);

.. admonition:: Intel Description

    Write 64-bit value in "val" to a user shadow stack page in memory specified by "p".

_setssbsy
^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _setssbsy(void );

.. admonition:: Intel Description

    Mark shadow stack pointed to by IA32_PL0_SSP as busy.

_clrssbsy
^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _clrssbsy(void * p);

.. admonition:: Intel Description

    Mark shadow stack pointed to by "p" as not busy.

_get_ssp
^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: __int32

.. code-block:: C

    __int64 _get_ssp(void );

.. admonition:: Intel Description

    If CET is enabled, read the low 32-bits of the current shadow stack pointer, and store the result in "dst". Otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := SSP[31:0]
        	

_get_ssp
^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: __int64

.. code-block:: C

    __int64 _get_ssp(void );

.. admonition:: Intel Description

    If CET is enabled, read the current shadow stack pointer, and store the result in "dst". Otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := SSP[63:0]
        	

_inc_ssp
^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _inc_ssp(unsigned int a);

.. admonition:: Intel Description

    Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        SSP := SSP + a[7:0] * 4
        	

_bnd_set_ptr_bounds
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void *
:Param Types:
    const void * srcmem, 
    size_t size
:Param ETypes:
     srcmem, 
    UI64 size

.. code-block:: C

    void* _bnd_set_ptr_bounds(const void* srcmem, size_t size)

.. admonition:: Intel Description

    Make a pointer with the value of "srcmem" and bounds set to ["srcmem", "srcmem" + "size" - 1], and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := srcmem
        dst.LB := srcmem.LB
        dst.UB := srcmem + size - 1
        	

_bnd_narrow_ptr_bounds
^^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void *
:Param Types:
    const void * q, 
    const void * r, 
    size_t size
:Param ETypes:
     q, 
     r, 
    UI64 size

.. code-block:: C

    void* _bnd_narrow_ptr_bounds(const void* q, const void* r,
                                 size_t size)

.. admonition:: Intel Description

    Narrow the bounds for pointer "q" to the intersection of the bounds of "r" and the bounds ["q", "q" + "size" - 1], and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := q
        IF r.LB > (q + size - 1) OR r.UB < q
        	dst.LB := 1
        	dst.UB := 0
        ELSE
        	dst.LB := MAX(r.LB, q)
        	dst.UB := MIN(r.UB, (q + size - 1))
        FI
        	

_bnd_copy_ptr_bounds
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void *
:Param Types:
    const void * q, 
    const void * r
:Param ETypes:
     q, 
     r

.. code-block:: C

    void * _bnd_copy_ptr_bounds(const void * q, const void * r);

.. admonition:: Intel Description

    Make a pointer with the value of "q" and bounds set to the bounds of "r" (e.g. copy the bounds of "r" to pointer "q"), and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := q
        dst.LB := r.LB
        dst.UB := r.UB
        	

_bnd_init_ptr_bounds
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void *

.. code-block:: C

    void * _bnd_init_ptr_bounds(const void * q);

.. admonition:: Intel Description

    Make a pointer with the value of "q" and open bounds, which allow the pointer to access the entire virtual address space, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := q
        dst.LB := 0
        dst.UB := 0
        	

_bnd_store_ptr_bounds
^^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    const void ** ptr_addr, 
    const void * ptr_val
:Param ETypes:
     ptr_addr, 
     ptr_val

.. code-block:: C

    void _bnd_store_ptr_bounds(const void** ptr_addr,
                               const void* ptr_val)

.. admonition:: Intel Description

    Stores the bounds of "ptr_val" pointer in memory at address "ptr_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        MEM[ptr_addr].LB := ptr_val.LB
        MEM[ptr_addr].UB := ptr_val.UB
        	

_bnd_chk_ptr_lbounds
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _bnd_chk_ptr_lbounds(const void * q);

.. admonition:: Intel Description

    Checks if "q" is within its lower bound, and throws a #BR if not.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF q < q.LB
        	#BR
        FI
        	

_bnd_chk_ptr_ubounds
^^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _bnd_chk_ptr_ubounds(const void * q);

.. admonition:: Intel Description

    Checks if "q" is within its upper bound, and throws a #BR if not.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF q > q.UB
        	#BR
        FI
        	

_bnd_chk_ptr_bounds
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    const void * q, 
    size_t size
:Param ETypes:
     q, 
    UI64 size

.. code-block:: C

    void _bnd_chk_ptr_bounds(const void * q, size_t size);

.. admonition:: Intel Description

    Checks if ["q", "q" + "size" - 1] is within the lower and upper bounds of "q" and throws a #BR if not.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF (q + size - 1) < q.LB OR (q + size - 1) > q.UB
        	#BR
        FI
        	

_bnd_get_ptr_lbound
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: const void *

.. code-block:: C

    const void * _bnd_get_ptr_lbound(const void * q);

.. admonition:: Intel Description

    Return the lower bound of "q".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := q.LB
        	

_bnd_get_ptr_ubound
^^^^^^^^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: const void *

.. code-block:: C

    const void * _bnd_get_ptr_ubound(const void * q);

.. admonition:: Intel Description

    Return the upper bound of "q".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst := q.UB
        	

_ptwrite32
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    void _ptwrite32(unsigned int a);

.. admonition:: Intel Description

    Insert the 32-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.

_ptwrite64
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    void _ptwrite64(unsigned __int64 a);

.. admonition:: Intel Description

    Insert the 64-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.

_enclu_u32
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned int
:Param Types:
    const int a, 
    size_t* __data
:Param ETypes:
    UI32 a, 
    UI64 __data

.. code-block:: C

    unsigned int _enclu_u32(const int a, size_t* __data);

.. admonition:: Intel Description

    Invoke the Intel SGX enclave user (non-privilege) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.

_encls_u32
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned int
:Param Types:
    const int a, 
    size_t* __data
:Param ETypes:
    UI32 a, 
    UI64 __data

.. code-block:: C

    unsigned int _encls_u32(const int a, size_t* __data);

.. admonition:: Intel Description

    Invoke the Intel SGX enclave system (privileged) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.

_enclv_u32
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned int
:Param Types:
    const int a, 
    size_t* __data
:Param ETypes:
    UI32 a, 
    UI64 __data

.. code-block:: C

    unsigned int _enclv_u32(const int a, size_t* __data);

.. admonition:: Intel Description

    Invoke the Intel SGX enclave virtualized (VMM) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.

_wbinvd
^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _wbinvd(void );

.. admonition:: Intel Description

    Write back and flush internal caches.
    		Initiate writing-back and flushing of external
    		caches.

_pconfig_u32
^^^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned int
:Param Types:
    int a, 
    size_t* __data
:Param ETypes:
    UI32 a, 
    UI64 __data

.. code-block:: C

    unsigned int _pconfig_u32(int a, size_t* __data);

.. admonition:: Intel Description

    Invoke the PCONFIG leaf function specified by "a". The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx. May return the value in eax, depending on the semantics of the specified leaf function.

_xsusldtrk
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    

.. admonition:: Intel Description

    Mark the start of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a transactional region, subsequent loads are not added to the read set of the transaction. If this is used inside a suspend load address tracking region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.

_xresldtrk
^^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    

.. admonition:: Intel Description

    Mark the end of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a suspend load address tracking region it will end the suspend region and all following load addresses will be added to the transaction read set. If this is used inside an active transaction but not in a suspend region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.

_tpause
^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned char
:Param Types:
    unsigned int ctrl, 
    unsigned __int64 counter
:Param ETypes:
    UI32 ctrl, 
    UI64 counter

.. code-block:: C

    unsigned char _tpause(unsigned int ctrl, unsigned __int64 counter);

.. admonition:: Intel Description

    Directs the processor to enter an implementation-dependent optimized state until the TSC reaches or exceeds the value specified in "counter". Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF). If the processor that executed a UMWAIT instruction wakes due to the expiration of the operating system timelimit, the instructions sets RFLAGS.CF; otherwise, that flag is cleared.

_umwait
^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: unsigned char
:Param Types:
    unsigned int ctrl, 
    unsigned __int64 counter
:Param ETypes:
    UI32 ctrl, 
    UI64 counter

.. code-block:: C

    unsigned char _umwait(unsigned int ctrl, unsigned __int64 counter);

.. admonition:: Intel Description

    Directs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The instruction wakes up when the TSC reaches or exceeds the value specified in "counter" (if the monitoring hardware did not trigger beforehand). Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF). If the processor that executed a UMWAIT instruction wakes due to the expiration of the operating system timelimit, the instructions sets RFLAGS.CF; otherwise, that flag is cleared.

_umonitor
^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _umonitor(void* a);

.. admonition:: Intel Description

    Sets up a linear address range to be
    		monitored by hardware and activates the
    		monitor. The address range should be a writeback
    		memory caching type. The address is
    		contained in "a".

_wbnoinvd
^^^^^^^^^
:Tech: Other
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: Other-Miscellaneous-Other
:Return Type: void

.. code-block:: C

    void _wbnoinvd(void );

.. admonition:: Intel Description

    Write back and do not flush internal caches.
    		Initiate writing-back without flushing of external
    		caches.

SVML
====
Probability/Statistics
----------------------
ZMM
~~~
_mm512_cdfnorm_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cdfnorm_pd(__m512d a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := CDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cdfnorm_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cdfnorm_pd(__m512d src, __mmask8 k,
                                   __m512d a)

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := CDFNormal(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cdfnorm_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cdfnorm_ps(__m512 a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := CDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cdfnorm_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cdfnorm_ps(__m512 src, __mmask16 k,
                                  __m512 a)

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := CDFNormal(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cdfnorminv_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cdfnorminv_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := InverseCDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cdfnorminv_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cdfnorminv_pd(__m512d src, __mmask8 k,
                                      __m512d a)

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := InverseCDFNormal(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cdfnorminv_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cdfnorminv_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := InverseCDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cdfnorminv_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cdfnorminv_ps(__m512 src, __mmask16 k,
                                     __m512 a)

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := InverseCDFNormal(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erf_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_erf_pd(__m512d a);

.. admonition:: Intel Description

    Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ERF(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erf_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_erf_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ERF(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfc_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_erfc_pd(__m512d a);

.. admonition:: Intel Description

    Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := 1.0 - ERF(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfc_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_erfc_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 1.0 - ERF(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erf_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_erf_ps(__m512 a);

.. admonition:: Intel Description

    Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ERF(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erf_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_erf_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ERF(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfc_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_erfc_ps(__m512 a);

.. admonition:: Intel Description

    Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+63:i] := 1.0 - ERF(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfc_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_erfc_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+63:i] := 1.0 - ERF(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfinv_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_erfinv_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := 1.0 / ERF(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfinv_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_erfinv_pd(__m512d src, __mmask8 k,
                                  __m512d a)

.. admonition:: Intel Description

    Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 1.0 / ERF(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfinv_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_erfinv_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+63:i] := 1.0 / ERF(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfinv_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_erfinv_ps(__m512 src, __mmask16 k,
                                 __m512 a)

.. admonition:: Intel Description

    Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+63:i] := 1.0 / ERF(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfcinv_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_erfcinv_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfcinv_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_erfcinv_pd(__m512d src, __mmask8 k,
                                   __m512d a)

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_erfcinv_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_erfcinv_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_erfcinv_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_erfcinv_ps(__m512 src, __mmask16 k,
                                  __m512 a)

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cdfnorm_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cdfnorm_ph(__m512h a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := CDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_cdfnorminv_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cdfnorminv_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := InverseCDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_erf_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_erf_ph(__m512h a);

.. admonition:: Intel Description

    Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ERF(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_erfc_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_erfc_ph(__m512h a);

.. admonition:: Intel Description

    Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := 1.0 - ERF(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_erfcinv_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_erfcinv_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_erfinv_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_erfinv_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := 1.0 / ERF(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cdfnorm_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cdfnorm_ph(__m512h src, __mmask32 k,
                                   __m512h a)

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := CDFNormal(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cdfnorminv_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cdfnorminv_ph(__m512h src, __mmask32 k,
                                      __m512h a)

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := InverseCDFNormal(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_erf_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_erf_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ERF(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_erfc_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_erfc_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 1.0 - ERF(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_erfcinv_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_erfcinv_ph(__m512h src, __mmask32 k,
                                   __m512h a)

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_erfinv_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_erfinv_ph(__m512h src, __mmask32 k,
                                  __m512h a)

.. admonition:: Intel Description

    Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 1.0 / ERF(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

YMM
~~~
_mm256_cdfnorm_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cdfnorm_pd(__m256d a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cdfnorm_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cdfnorm_ps(__m256 a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := CDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cdfnorminv_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cdfnorminv_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := InverseCDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cdfnorminv_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cdfnorminv_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := InverseCDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erf_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_erf_pd(__m256d a);

.. admonition:: Intel Description

    Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ERF(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erf_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_erf_ps(__m256 a);

.. admonition:: Intel Description

    Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ERF(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfc_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_erfc_pd(__m256d a);

.. admonition:: Intel Description

    Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := 1.0 - ERF(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfc_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_erfc_ps(__m256 a);

.. admonition:: Intel Description

    Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+63:i] := 1.0 - ERF(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfcinv_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_erfcinv_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfcinv_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_erfcinv_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfinv_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_erfinv_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := 1.0 / ERF(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_erfinv_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_erfinv_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+63:i] := 1.0 / ERF(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cdfnorm_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cdfnorm_ph(__m256h a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := CDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cdfnorminv_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cdfnorminv_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := InverseCDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_erf_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_erf_ph(__m256h a);

.. admonition:: Intel Description

    Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ERF(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_erfc_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_erfc_ph(__m256h a);

.. admonition:: Intel Description

    Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := 1.0 - ERF(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_erfcinv_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_erfcinv_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_erfinv_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_erfinv_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := 1.0 / ERF(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

XMM
~~~
_mm_cdfnorm_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cdfnorm_ph(__m128h a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := CDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cdfnorminv_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cdfnorminv_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := InverseCDFNormal(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_erf_ph
^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_erf_ph(__m128h a);

.. admonition:: Intel Description

    Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ERF(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_erfc_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_erfc_ph(__m128h a);

.. admonition:: Intel Description

    Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := 1.0 - ERF(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_erfcinv_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_erfcinv_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
        ENDFOR
        dst[MAX:128] := 0
        

_mm_erfinv_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_erfinv_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := 1.0 / ERF(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cdfnorm_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cdfnorm_pd(__m128d a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cdfnorm_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cdfnorm_ps(__m128 a);

.. admonition:: Intel Description

    Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := CDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cdfnorminv_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cdfnorminv_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := InverseCDFNormal(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cdfnorminv_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cdfnorminv_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := InverseCDFNormal(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erf_ps
^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_erf_ps(__m128 a);

.. admonition:: Intel Description

    Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ERF(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfc_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_erfc_pd(__m128d a);

.. admonition:: Intel Description

    Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := 1.0 - ERF(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfc_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_erfc_ps(__m128 a);

.. admonition:: Intel Description

    Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+63:i] := 1.0 - ERF(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfcinv_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_erfcinv_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfcinv_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_erfcinv_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfinv_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_erfinv_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := 1.0 / ERF(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erfinv_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: SVML-Probability/Statistics-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_erfinv_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+63:i] := 1.0 / ERF(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

Special Math Functions
----------------------
ZMM
~~~
_mm512_ceil_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_ceil_pd(__m512d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := CEIL(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ceil_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_ceil_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := CEIL(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ceil_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_ceil_ps(__m512 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := CEIL(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ceil_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_ceil_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := CEIL(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_floor_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_floor_pd(__m512d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := FLOOR(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_floor_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_floor_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FLOOR(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_floor_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_floor_ps(__m512 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := FLOOR(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_floor_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_floor_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FLOOR(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_nearbyint_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_nearbyint_pd(__m512d a);

.. admonition:: Intel Description

    Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := NearbyInt(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_nearbyint_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_nearbyint_pd(__m512d src, __mmask8 k,
                                     __m512d a)

.. admonition:: Intel Description

    Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := NearbyInt(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_nearbyint_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_nearbyint_ps(__m512 a);

.. admonition:: Intel Description

    Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := NearbyInt(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_nearbyint_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_nearbyint_ps(__m512 src, __mmask16 k,
                                    __m512 a)

.. admonition:: Intel Description

    Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := NearbyInt(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rint_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_rint_pd(__m512d a);

.. admonition:: Intel Description

    Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RoundToNearestEven(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rint_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_rint_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundToNearestEven(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rint_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_rint_ps(__m512 a);

.. admonition:: Intel Description

    Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RoundToNearestEven(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rint_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_rint_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundToNearestEven(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_svml_round_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_svml_round_pd(__m512d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ROUND(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_svml_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_svml_round_pd(__m512d src, __mmask8 k,
                                      __m512d a)

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ROUND(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i] 
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_trunc_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_trunc_pd(__m512d a);

.. admonition:: Intel Description

    Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := TRUNCATE(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_trunc_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_trunc_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := TRUNCATE(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_trunc_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_trunc_ps(__m512 a);

.. admonition:: Intel Description

    Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := TRUNCATE(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_trunc_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_trunc_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := TRUNCATE(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ceil_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_ceil_ph(__m512h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := CEIL(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_floor_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_floor_ph(__m512h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := FLOOR(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_ceil_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_ceil_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := CEIL(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_floor_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_floor_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := FLOOR(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_nearbyint_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_nearbyint_ph(__m512h src, __mmask32 k,
                                     __m512h a)

.. admonition:: Intel Description

    Rounds each packed half-precision (16-bit) floating-point element in "a" to the nearest integer value and stores the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := NearbyInt(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_rint_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_rint_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Rounds the packed half-precision (16-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := RoundToNearestEven(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_svml_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_svml_round_ph(__m512h src, __mmask32 k,
                                      __m512h a)

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ROUND(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_trunc_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_trunc_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := TRUNCATE(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_nearbyint_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_nearbyint_ph(__m512h a);

.. admonition:: Intel Description

    Rounds each packed half-precision (16-bit) floating-point element in "a" to the nearest integer value and stores the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := NearbyInt(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_rint_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_rint_ph(__m512h a);

.. admonition:: Intel Description

    Rounds the packed half-precision (16-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := RoundToNearestEven(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_svml_round_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_svml_round_ph(__m512h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ROUND(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_trunc_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_trunc_ph(__m512h a);

.. admonition:: Intel Description

    Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := TRUNCATE(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

YMM
~~~
_mm256_svml_ceil_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_svml_ceil_pd(__m256d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CEIL(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_ceil_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_svml_ceil_ps(__m256 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := CEIL(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_floor_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_svml_floor_pd(__m256d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := FLOOR(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_floor_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_svml_floor_ps(__m256 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := FLOOR(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_round_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_svml_round_pd(__m256d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ROUND(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_round_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_svml_round_ps(__m256 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ROUND(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_ceil_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_svml_ceil_ph(__m256h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := CEIL(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_svml_floor_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_svml_floor_ph(__m256h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := FLOOR(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_svml_round_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_svml_round_ph(__m256h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ROUND(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_trunc_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_trunc_ph(__m256h a);

.. admonition:: Intel Description

    Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst"

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := TRUNCATE(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

XMM
~~~
_mm_svml_ceil_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_svml_ceil_ph(__m128h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := CEIL(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_svml_floor_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_svml_floor_ph(__m128h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := FLOOR(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_svml_round_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_svml_round_ph(__m128h a);

.. admonition:: Intel Description

    Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ROUND(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_trunc_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_trunc_ph(__m128h a);

.. admonition:: Intel Description

    Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := TRUNCATE(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_svml_ceil_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_svml_ceil_pd(__m128d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CEIL(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_ceil_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_svml_ceil_ps(__m128 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := CEIL(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_floor_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_svml_floor_pd(__m128d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := FLOOR(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_floor_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_svml_floor_ps(__m128 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := FLOOR(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_round_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_svml_round_pd(__m128d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ROUND(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_round_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: SVML-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_svml_round_ps(__m128 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ROUND(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

Trigonometry
------------
ZMM
~~~
_mm512_acos_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_acos_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ACOS(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_acos_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_acos_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ACOS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_acos_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_acos_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ACOS(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_acos_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_acos_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ACOS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_acosh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_acosh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ACOSH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_acosh_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_acosh_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ACOSH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_acosh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_acosh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ACOSH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_acosh_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_acosh_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ACOSH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_asin_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_asin_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ASIN(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_asin_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_asin_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ASIN(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_asin_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_asin_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ASIN(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_asin_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_asin_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ASIN(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_asinh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_asinh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ASINH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_asinh_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_asinh_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ASINH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_asinh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_asinh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ASINH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_asinh_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_asinh_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ASINH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atan2_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_atan2_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atan2_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_atan2_pd(__m512d src, __mmask8 k,
                                 __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atan2_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_atan2_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atan2_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_atan2_ps(__m512 src, __mmask16 k,
                                __m512 a, __m512 b)

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atan_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_atan_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ATAN(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atan_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_atan_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ATAN(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atan_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_atan_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ATAN(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atan_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_atan_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ATAN(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atanh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_atanh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ATANH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atanh_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_atanh_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ATANH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_atanh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_atanh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse hyperblic tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ATANH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_atanh_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_atanh_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ATANH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cos_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cos_pd(__m512d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cos_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cos_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := COS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cos_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cos_ps(__m512 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cos_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cos_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := COS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cosd_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cosd_pd(__m512d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := COSD(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cosd_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cosd_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := COSD(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cosd_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cosd_ps(__m512 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := COSD(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cosd_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cosd_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := COSD(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cosh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cosh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := COSH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cosh_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cosh_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := COSH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cosh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cosh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := COSH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cosh_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cosh_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := COSH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sin_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_sin_pd(__m512d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sin_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_sin_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SIN(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sin_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_sin_ps(__m512 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sin_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_sin_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SIN(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sinh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_sinh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SINH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sinh_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_sinh_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SINH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sinh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_sinh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SINH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sinh_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_sinh_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SINH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sind_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_sind_pd(__m512d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SIND(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sind_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_sind_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SIND(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sind_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_sind_ps(__m512 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SIND(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sind_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_sind_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SIND(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tan_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_tan_pd(__m512d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := TAN(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tan_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_tan_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := TAN(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tan_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_tan_ps(__m512 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := TAN(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tan_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_tan_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := TAN(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tand_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_tand_pd(__m512d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := TAND(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tand_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_tand_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := TAND(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tand_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_tand_ps(__m512 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := TAND(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tand_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_tand_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := TAND(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tanh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_tanh_pd(__m512d a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := TANH(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tanh_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_tanh_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := TANH(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_tanh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_tanh_ps(__m512 a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := TANH(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_tanh_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_tanh_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := TANH(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sincos_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d * mem_addr, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    __m512d _mm512_sincos_pd(__m512d * mem_addr, __m512d a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        	MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        	

_mm512_mask_sincos_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d * mem_addr, 
    __m512d sin_src, 
    __m512d cos_src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 sin_src, 
    FP64 cos_src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_sincos_pd(__m512d* mem_addr,
                                  __m512d sin_src,
                                  __m512d cos_src, __mmask8 k,
                                  __m512d a)

.. admonition:: Intel Description

    Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SIN(a[i+63:i])
        		MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := sin_src[i+63:i]
        		MEM[mem_addr+i+63:mem_addr+i] := cos_src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        	

_mm512_sincos_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 * mem_addr, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    __m512 _mm512_sincos_ps(__m512 * mem_addr, __m512 a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        	

_mm512_mask_sincos_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 * mem_addr, 
    __m512 sin_src, 
    __m512 cos_src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 sin_src, 
    FP32 cos_src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_sincos_ps(__m512* mem_addr,
                                 __m512 sin_src, __m512 cos_src,
                                 __mmask16 k, __m512 a)

.. admonition:: Intel Description

    Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SIN(a[i+31:i])
        		MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := sin_src[i+31:i]
        		MEM[mem_addr+i+31:mem_addr+i] := cos_src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        	

_mm512_acos_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_acos_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ACOS(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_acosh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_acosh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ACOSH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_asin_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_asin_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ASIN(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_asinh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_asinh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ASINH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_atan2_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_atan2_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_atan_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_atan_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ATAN(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_atanh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_atanh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse hyperblic tangent of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ATANH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_cos_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cos_ph(__m512h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_cosd_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cosd_ph(__m512h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := COSD(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_cosh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cosh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := COSH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_acos_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_acos_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ACOS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_acosh_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_acosh_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ACOSH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_asin_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_asin_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ASIN(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_asinh_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_asinh_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ASINH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_atan_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_atan_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ATAN(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_atanh_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_atanh_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ATANH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cos_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cos_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := COS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cosd_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cosd_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := COSD(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cosh_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cosh_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := COSH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_sin_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_sin_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SIN(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_sincos_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h* mem_addr, 
    __m512h sin_src, 
    __m512h cos_src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 sin_src, 
    FP16 cos_src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_sincos_ph(__m512h* mem_addr,
                                  __m512h sin_src,
                                  __m512h cos_src, __mmask32 k,
                                  __m512h a)

.. admonition:: Intel Description

    Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SIN(a[i+15:i])
        		MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := sin_src[i+15:i]
        		MEM[mem_addr+i+15:mem_addr+i] := cos_src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        

_mm512_mask_sind_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_sind_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SIND(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_sinh_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_sinh_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SINH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_tan_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_tan_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := TAN(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_tand_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_tand_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := TAND(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_tanh_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_tanh_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := TANH(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_sin_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_sin_ph(__m512h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_sincos_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h* mem_addr, 
    __m512h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    __m512h _mm512_sincos_ph(__m512h* mem_addr, __m512h a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        	MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        cos_res[MAX:512] := 0
        

_mm512_sind_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_sind_ph(__m512h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SIND(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_sinh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_sinh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SINH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_tan_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_tan_ph(__m512h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := TAN(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_tand_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_tand_ph(__m512h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := TAND(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_tanh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_tanh_ph(__m512h a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := TANH(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

YMM
~~~
_mm256_acos_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_acos_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ACOS(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_acos_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_acos_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ACOS(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_acosh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_acosh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ACOSH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_acosh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_acosh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ACOSH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_asin_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_asin_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ASIN(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_asin_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_asin_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ASIN(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_asinh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_asinh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ASINH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_asinh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_asinh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ASINH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atan_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_atan_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ATAN(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atan_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_atan_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ATAN(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atan2_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_atan2_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atan2_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_atan2_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atanh_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_atanh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ATANH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_atanh_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_atanh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ATANH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cos_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cos_pd(__m256d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cos_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cos_ps(__m256 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cosd_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cosd_pd(__m256d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := COSD(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cosd_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cosd_ps(__m256 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := COSD(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cosh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cosh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := COSH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cosh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cosh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := COSH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_hypot_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_hypot_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_hypot_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_hypot_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sin_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_sin_pd(__m256d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sin_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_sin_ps(__m256 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sincos_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d * mem_addr, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    __m256d _mm256_sincos_pd(__m256d * mem_addr, __m256d a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        	MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sincos_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 * mem_addr, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    __m256 _mm256_sincos_ps(__m256 * mem_addr, __m256 a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sind_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_sind_pd(__m256d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SIND(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sind_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_sind_ps(__m256 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SIND(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sinh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_sinh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SINH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sinh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_sinh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SINH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tan_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_tan_pd(__m256d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := TAN(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tan_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_tan_ps(__m256 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := TAN(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tand_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_tand_pd(__m256d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := TAND(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tand_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_tand_ps(__m256 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := TAND(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tanh_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_tanh_pd(__m256d a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := TANH(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_tanh_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_tanh_ps(__m256 a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := TANH(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_acos_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_acos_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ACOS(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_acosh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_acosh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ACOSH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_asin_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_asin_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ASIN(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_asinh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_asinh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ASINH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_atan2_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_atan2_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_atan_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_atan_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ATAN(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_atanh_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_atanh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ATANH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cos_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cos_ph(__m256h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cosd_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cosd_ph(__m256h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := COSD(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cosh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cosh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := COSH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_sin_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_sin_ph(__m256h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_sincos_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h* mem_addr, 
    __m256h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    __m256h _mm256_sincos_ph(__m256h* mem_addr, __m256h a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        	MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        cos_res[MAX:256] := 0
        

_mm256_sind_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_sind_ph(__m256h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SIND(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_sinh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_sinh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SINH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_tan_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_tan_ph(__m256h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := TAN(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_tand_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_tand_ph(__m256h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := TAND(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_tanh_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_tanh_ph(__m256h a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := TANH(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

XMM
~~~
_mm_acos_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_acos_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ACOS(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_acosh_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_acosh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ACOSH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_asin_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_asin_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ASIN(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_asinh_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_asinh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ASINH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_atan2_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_atan2_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_atan_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_atan_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ATAN(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_atanh_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_atanh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ATANH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cos_ph
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cos_ph(__m128h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cosd_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cosd_ph(__m128h a);

.. admonition:: Intel Description

    Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := COSD(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cosh_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cosh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := COSH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_sin_ph
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_sin_ph(__m128h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_sincos_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h* mem_addr, 
    __m128h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    __m128h _mm_sincos_ph(__m128h* mem_addr, __m128h a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SIN(a[i+15:i])
        	MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        cos_res[MAX:128] := 0
        

_mm_sind_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_sind_ph(__m128h a);

.. admonition:: Intel Description

    Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SIND(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_sinh_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_sinh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SINH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_tan_ph
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_tan_ph(__m128h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := TAN(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_tand_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_tand_ph(__m128h a);

.. admonition:: Intel Description

    Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := TAND(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_tanh_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_tanh_ph(__m128h a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := TANH(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_acos_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_acos_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ACOS(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_acos_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_acos_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ACOS(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_acosh_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_acosh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ACOSH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_acosh_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_acosh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ACOSH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_asin_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_asin_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ASIN(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_asin_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_asin_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ASIN(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_asinh_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_asinh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ASINH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_asinh_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_asinh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ASINH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atan_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_atan_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ATAN(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atan_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_atan_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ATAN(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atan2_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_atan2_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atan2_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_atan2_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atanh_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_atanh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ATANH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_atanh_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_atanh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ATANH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cos_pd
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cos_pd(__m128d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cos_ps
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cos_ps(__m128 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cosd_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cosd_pd(__m128d a);

.. admonition:: Intel Description

    Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := COSD(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cosd_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cosd_ps(__m128 a);

.. admonition:: Intel Description

    Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := COSD(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cosh_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cosh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := COSH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cosh_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cosh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := COSH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_hypot_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_hypot_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_hypot_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_hypot_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sin_pd
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_sin_pd(__m128d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sin_ps
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_sin_ps(__m128 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sincos_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d * mem_addr, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    __m128d _mm_sincos_pd(__m128d * mem_addr, __m128d a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SIN(a[i+63:i])
        	MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sincos_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 * mem_addr, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    __m128 _mm_sincos_ps(__m128 * mem_addr, __m128 a);

.. admonition:: Intel Description

    Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SIN(a[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sind_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_sind_pd(__m128d a);

.. admonition:: Intel Description

    Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SIND(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sind_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_sind_ps(__m128 a);

.. admonition:: Intel Description

    Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SIND(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sinh_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_sinh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SINH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sinh_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_sinh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SINH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tan_pd
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_tan_pd(__m128d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := TAN(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tan_ps
^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_tan_ps(__m128 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := TAN(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tand_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_tand_pd(__m128d a);

.. admonition:: Intel Description

    Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := TAND(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tand_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_tand_ps(__m128 a);

.. admonition:: Intel Description

    Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := TAND(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tanh_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_tanh_pd(__m128d a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := TANH(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_tanh_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Trigonometry
:Header: immintrin.h
:Searchable: SVML-Trigonometry-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_tanh_ps(__m128 a);

.. admonition:: Intel Description

    Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := TANH(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

Elementary Math Functions
-------------------------
ZMM
~~~
_mm512_cbrt_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_cbrt_pd(__m512d a);

.. admonition:: Intel Description

    Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := CubeRoot(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cbrt_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_cbrt_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := CubeRoot(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cbrt_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_cbrt_ps(__m512 a);

.. admonition:: Intel Description

    Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := CubeRoot(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cbrt_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_cbrt_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := CubeRoot(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp10_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_exp10_pd(__m512d a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POW(10.0, a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp10_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_exp10_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POW(10.0, a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp10_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_exp10_ps(__m512 a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp10_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_exp10_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp2_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_exp2_pd(__m512d a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POW(2.0, a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp2_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_exp2_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POW(2.0, a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp2_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_exp2_ps(__m512 a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp2_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_exp2_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_exp_pd(__m512d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_exp_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POW(e, a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_exp_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_exp_ps(__m512 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_exp_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_exp_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POW(FP32(e), a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_expm1_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_expm1_pd(__m512d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expm1_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_expm1_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_expm1_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_expm1_ps(__m512 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expm1_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_expm1_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_hypot_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_hypot_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_hypot_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_hypot_pd(__m512d src, __mmask8 k,
                                 __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_hypot_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_hypot_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_hypot_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_hypot_ps(__m512 src, __mmask16 k,
                                __m512 a, __m512 b)

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_invsqrt_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_invsqrt_pd(__m512d a);

.. admonition:: Intel Description

    Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := InvSQRT(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_invsqrt_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_invsqrt_pd(__m512d src, __mmask8 k,
                                   __m512d a)

.. admonition:: Intel Description

    Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := InvSQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_invsqrt_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_invsqrt_ps(__m512 a);

.. admonition:: Intel Description

    Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := InvSQRT(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_invsqrt_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_invsqrt_ps(__m512 src, __mmask16 k,
                                  __m512 a)

.. admonition:: Intel Description

    Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := InvSQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log10_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_log10_pd(__m512d a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log10_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_log10_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log10_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_log10_ps(__m512 a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log10_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_log10_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log1p_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_log1p_pd(__m512d a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LOG(1.0 + a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log1p_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_log1p_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LOG(1.0 + a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log1p_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_log1p_ps(__m512 a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LOG(1.0 + a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log1p_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_log1p_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LOG(1.0 + a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log2_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_log2_pd(__m512d a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log2_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_log2_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_log_pd(__m512d a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_log_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LOG(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_log_ps(__m512 a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_log_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LOG(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_logb_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_logb_pd(__m512d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_logb_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_logb_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_logb_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_logb_ps(__m512 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_logb_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_logb_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_pow_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_pow_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_pow_pd
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_pow_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_pow_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_pow_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_pow_ps
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_pow_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_recip_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_recip_pd(__m512d a);

.. admonition:: Intel Description

    Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (1.0 / a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_recip_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_recip_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_recip_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_recip_ps(__m512 a);

.. admonition:: Intel Description

    Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (1.0 / a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_recip_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_recip_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_log2_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_log2_ps(__m512 a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_log2_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_log2_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cbrt_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_cbrt_ph(__m512h a);

.. admonition:: Intel Description

    Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := CubeRoot(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_exp10_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_exp10_ph(__m512h a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_exp2_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_exp2_ph(__m512h a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_exp_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_exp_ph(__m512h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_expm1_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_expm1_ph(__m512h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_hypot_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_hypot_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_invsqrt_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_invsqrt_ph(__m512h a);

.. admonition:: Intel Description

    Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := InvSQRT(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_log10_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_log10_ph(__m512h a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_log1p_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_log1p_ph(__m512h a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := LOG(1.0 + a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_log2_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_log2_ph(__m512h a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_log_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_log_ph(__m512h a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_logb_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_logb_ph(__m512h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ConvertExpFP16(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_cbrt_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_cbrt_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := CubeRoot(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_exp10_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_exp10_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_exp2_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_exp2_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_exp_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_exp_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POW(FP16(e), a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_expm1_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_expm1_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_invsqrt_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_invsqrt_ph(__m512h src, __mmask32 k,
                                   __m512h a)

.. admonition:: Intel Description

    Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := InvSQRT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_log10_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_log10_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_log1p_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_log1p_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := LOG(1.0 + a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_log2_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_log2_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_log_ph
^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_log_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := LOG(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_logb_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_logb_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ConvertExpFP16(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_mask_recip_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_recip_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Computes the reciprocal of packed half-precision (16-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (1.0 / a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_pow_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_pow_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

_mm512_recip_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_recip_ph(__m512h a);

.. admonition:: Intel Description

    Computes the reciprocal of packed half-precision (16-bit) floating-point elements in "a", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := (1.0 / a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        

YMM
~~~
_mm256_cbrt_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_cbrt_pd(__m256d a);

.. admonition:: Intel Description

    Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CubeRoot(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cbrt_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cbrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := CubeRoot(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cexp_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_cexp_ps(__m256 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CEXP(a[31:0], b[31:0]) {
        	result[31:0]  := POW(FP32(e), a[31:0]) * COS(b[31:0])
        	result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0])
        	RETURN result
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_clog_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_clog_ps(__m256 a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CLOG(a[31:0], b[31:0]) {
        	result[31:0]  := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0)))
        	result[63:32] := ATAN2(b, a)
        	RETURN result
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_csqrt_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_csqrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CSQRT(a[31:0], b[31:0]) {
        	sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0)
        	result[31:0]  := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
        	result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
        	RETURN result
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_exp_pd(__m256d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_exp_ps(__m256 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp10_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_exp10_pd(__m256d a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POW(10.0, a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp10_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_exp10_ps(__m256 a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp2_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_exp2_pd(__m256d a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POW(2.0, a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_exp2_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_exp2_ps(__m256 a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_expm1_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_expm1_pd(__m256d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_expm1_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_expm1_ps(__m256 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_invcbrt_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_invcbrt_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := InvCubeRoot(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_invcbrt_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_invcbrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := InvCubeRoot(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_invsqrt_pd
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_invsqrt_pd(__m256d a);

.. admonition:: Intel Description

    Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := InvSQRT(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_invsqrt_ps
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_invsqrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := InvSQRT(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_log_pd(__m256d a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_log_ps(__m256 a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log10_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_log10_pd(__m256d a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log10_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_log10_ps(__m256 a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log1p_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_log1p_pd(__m256d a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LOG(1.0 + a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log1p_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_log1p_ps(__m256 a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LOG(1.0 + a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log2_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_log2_pd(__m256d a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_log2_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_log2_ps(__m256 a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_logb_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_logb_pd(__m256d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_logb_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_logb_ps(__m256 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_pow_pd
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_pow_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_pow_ps
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_pow_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_sqrt_pd
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_svml_sqrt_pd(__m256d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_svml_sqrt_ps
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_svml_sqrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cbrt_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_cbrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := CubeRoot(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_exp10_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_exp10_ph(__m256h a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_exp2_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_exp2_ph(__m256h a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_exp_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_exp_ph(__m256h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_expm1_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_expm1_ph(__m256h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_hypot_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_hypot_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_invcbrt_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_invcbrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := InvCubeRoot(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_invsqrt_ph
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_invsqrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := InvSQRT(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_log10_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_log10_ph(__m256h a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_log1p_ph
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_log1p_ph(__m256h a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := LOG(1.0 + a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_log2_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_log2_ph(__m256h a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_log_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_log_ph(__m256h a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_logb_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_logb_ph(__m256h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ConvertExpFP16(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_pow_ph
^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_pow_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_svml_sqrt_ph
^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_svml_sqrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SQRT(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        

XMM
~~~
_mm_cbrt_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_cbrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := CubeRoot(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_exp10_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_exp10_ph(__m128h a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_exp2_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_exp2_ph(__m128h a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_exp_ph
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_exp_ph(__m128h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_expm1_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_expm1_ph(__m128h a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
        ENDFOR
        dst[MAX:128] := 0
        

_mm_hypot_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_hypot_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
        ENDFOR
        dst[MAX:128] := 0
        

_mm_invcbrt_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_invcbrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := InvCubeRoot(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_invsqrt_ph
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_invsqrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := InvSQRT(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_log10_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_log10_ph(__m128h a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:128] := 0
        

_mm_log1p_ph
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_log1p_ph(__m128h a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := LOG(1.0 + a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_log2_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_log2_ph(__m128h a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:128] := 0
        

_mm_log_ph
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_log_ph(__m128h a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := LOG(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_logb_ph
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_logb_ph(__m128h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ConvertExpFP16(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_pow_ph
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_pow_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_svml_sqrt_ph
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_svml_sqrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := SQRT(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cbrt_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_cbrt_pd(__m128d a);

.. admonition:: Intel Description

    Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CubeRoot(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cbrt_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cbrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := CubeRoot(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cexp_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_cexp_ps(__m128 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CEXP(a[31:0], b[31:0]) {
        	result[31:0]  := POW(FP32(e), a[31:0]) * COS(b[31:0])
        	result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0])
        	RETURN result
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_clog_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_clog_ps(__m128 a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CLOG(a[31:0], b[31:0]) {
        	result[31:0]  := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0)))
        	result[63:32] := ATAN2(b, a)
        	RETURN result
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_csqrt_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_csqrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE CSQRT(a[31:0], b[31:0]) {
        	sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0)
        	result[31:0]  := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
        	result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
        	RETURN result
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp_pd
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_exp_pd(__m128d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp_ps
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_exp_ps(__m128 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp10_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_exp10_pd(__m128d a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POW(10.0, a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp10_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_exp10_ps(__m128 a);

.. admonition:: Intel Description

    Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp2_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_exp2_pd(__m128d a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POW(2.0, a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_exp2_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_exp2_ps(__m128 a);

.. admonition:: Intel Description

    Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_expm1_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_expm1_pd(__m128d a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_expm1_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_expm1_ps(__m128 a);

.. admonition:: Intel Description

    Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_invcbrt_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_invcbrt_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := InvCubeRoot(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_invcbrt_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_invcbrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := InvCubeRoot(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_invsqrt_pd
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_invsqrt_pd(__m128d a);

.. admonition:: Intel Description

    Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := InvSQRT(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_invsqrt_ps
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_invsqrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := InvSQRT(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log_pd
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_log_pd(__m128d a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log_ps
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_log_ps(__m128 a);

.. admonition:: Intel Description

    Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log10_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_log10_pd(__m128d a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log10_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_log10_ps(__m128 a);

.. admonition:: Intel Description

    Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log1p_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_log1p_pd(__m128d a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LOG(1.0 + a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log1p_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_log1p_ps(__m128 a);

.. admonition:: Intel Description

    Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LOG(1.0 + a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log2_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_log2_pd(__m128d a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_log2_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_log2_ps(__m128 a);

.. admonition:: Intel Description

    Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_logb_pd
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_logb_pd(__m128d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_logb_ps
^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_logb_ps(__m128 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_pow_pd
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_pow_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_pow_ps
^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_pow_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_sqrt_pd
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_svml_sqrt_pd(__m128d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_svml_sqrt_ps
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: SVML-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_svml_sqrt_ps(__m128 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

Arithmetic
----------
ZMM
~~~
_mm512_div_epi32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_div_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_div_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		IF b[i+31:i] == 0
        			#DE
        		FI
        		dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epi8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_div_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epi16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_div_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epi64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_div_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epi32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_rem_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rem_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_rem_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epi8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_rem_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 63
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epi16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_rem_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epi64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_rem_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epu32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_div_epu32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_div_epu32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		IF b[i+31:i] == 0
        			#DE
        		FI
        		dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epu8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_div_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epu16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_div_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_epu64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_div_epu64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epu32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_rem_epu32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rem_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_rem_epu32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epu8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_rem_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 63
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epu16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_rem_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rem_epu64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_rem_epu64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_div_epi8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_div_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epi16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_div_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epi32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_div_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epi64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_div_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epu8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_div_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epu16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_div_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epu32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_div_epu32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_epu64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_div_epu64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_idiv_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_idiv_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_idivrem_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i * mem_addr, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 mem_addr, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_idivrem_epi32(__m256i* mem_addr, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_irem_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_irem_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epi8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_rem_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epi16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_rem_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epi32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_rem_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epi64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_rem_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epu8
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_rem_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 31
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epu16
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_rem_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epu32
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_rem_epu32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rem_epu64
^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_rem_epu64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_udiv_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_udiv_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_udivrem_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i * mem_addr, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 mem_addr, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_udivrem_epi32(__m256i* mem_addr, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_urem_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_urem_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_div_epi8
^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_div_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epi16
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_div_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epi32
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_div_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epi64
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_div_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epu8
^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_div_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 8*j
        	IF b[i+7:i] == 0
        		#DE
        	FI
        	dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epu16
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_div_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	IF b[i+15:i] == 0
        		#DE
        	FI
        	dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epu32
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_div_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF b[i+31:i] == 0
        		#DE
        	FI
        	dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_epu64
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_div_epu64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	IF b[i+63:i] == 0
        		#DE
        	FI
        	dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_erf_pd
^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_erf_pd(__m128d a);

.. admonition:: Intel Description

    Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ERF(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_idiv_epi32
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_idiv_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_idivrem_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i * mem_addr, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 mem_addr, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_idivrem_epi32(__m128i* mem_addr, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_irem_epi32
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_irem_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epi8
^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_rem_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epi16
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_rem_epi16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epi32
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_rem_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epi64
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_rem_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epu8
^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_rem_epu8(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := 8*j
        	dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epu16
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_rem_epu16(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := 16*j
        	dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epu32
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_rem_epu32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rem_epu64
^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_rem_epu64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := 64*j
        	dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_udiv_epi32
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_udiv_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_udivrem_epi32
^^^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i * mem_addr, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 mem_addr, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_udivrem_epi32(__m128i* mem_addr, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
        	MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_urem_epi32
^^^^^^^^^^^^^^
:Tech: SVML
:Category: Arithmetic
:Header: immintrin.h
:Searchable: SVML-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_urem_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

Miscellaneous
-------------
YMM
~~~
_mm256_trunc_pd
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: SVML-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_trunc_pd(__m256d a);

.. admonition:: Intel Description

    Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := TRUNCATE(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_trunc_ps
^^^^^^^^^^^^^^^
:Tech: SVML
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: SVML-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_trunc_ps(__m256 a);

.. admonition:: Intel Description

    Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := TRUNCATE(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_trunc_pd
^^^^^^^^^^^^
:Tech: SVML
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: SVML-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_trunc_pd(__m128d a);

.. admonition:: Intel Description

    Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := TRUNCATE(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_trunc_ps
^^^^^^^^^^^^
:Tech: SVML
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: SVML-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_trunc_ps(__m128 a);

.. admonition:: Intel Description

    Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := TRUNCATE(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

AMX
===
Application-Targeted
--------------------
Other
~~~~~
_tile_dpbf16ps
^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
     dst, 
     a, 
     b

.. code-block:: C

    void _tile_dpbf16ps(constexpr int dst, constexpr int a,
                        constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(a.row[m].bf16[2*k+0]) * FP32(b.row[k].bf16[2*n+0])
        			tmp.fp32[n] += FP32(a.row[m].bf16[2*k+1]) * FP32(b.row[k].bf16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

__tile_dpbf16ps
^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpbf16ps(__tile1024i* dst, __tile1024i src0,
                         __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "src0" and "src1", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(src0.row[m].bf16[2*k+0]) * FP32(src1.row[k].bf16[2*n+0])
        			tmp.fp32[n] += FP32(src0.row[m].bf16[2*k+1]) * FP32(src1.row[k].bf16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

_tile_cmmimfp16ps
^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
    FP32 dst, 
    FP16 a, 
    FP16 b

.. code-block:: C

    void _tile_cmmimfp16ps(constexpr int dst, constexpr int a,
                           constexpr int b)

.. admonition:: Intel Description

    Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "a" and "b" is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the imaginary part of the result. For each possible combination of (row of "a", column of "b"), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from "a" and one from "b"). The imaginary part of the "a" element is multiplied with the real part of the corresponding "b" element, and the real part of the "a" element is multiplied with the imaginary part of the corresponding "b" elements. The two accumulated results are added, and then accumulated into the corresponding row and column of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+1])
        			tmp.fp32[n] += FP32(a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+0])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_cmmrlfp16ps
^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
    FP32 dst, 
    FP16 a, 
    FP16 b

.. code-block:: C

    void _tile_cmmrlfp16ps(constexpr int dst, constexpr int a,
                           constexpr int b)

.. admonition:: Intel Description

    Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "a" and "b" is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the real part of the result. For each possible combination of (row of "a", column of "b"), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from "a" and one from "b"). The real part of the "a" element is multiplied with the real part of the corresponding "b" element, and the negated imaginary part of the "a" element is multiplied with the imaginary part of the corresponding "b" elements. The two accumulated results are added, and then accumulated into the corresponding row and column of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
        			tmp.fp32[n] += FP32(-a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_cmmimfp16ps
^^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_cmmimfp16ps(__tile1024i* dst, __tile1024i src0,
                            __tile1024i src1)

.. admonition:: Intel Description

    Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "src0" and "src1" is interpreted as a complex number with FP16 real part and FP16 imaginary part. This function calculates the imaginary part of the result.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+1])
        			tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+0])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_cmmrlfp16ps
^^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_cmmrlfp16ps(__tile1024i* dst, __tile1024i src0,
                            __tile1024i src1)

.. admonition:: Intel Description

    Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles src0 and src1 is interpreted as a complex number with FP16 real part and FP16 imaginary part. This function calculates the real part of the result.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+0])
        			tmp.fp32[n] += FP32(-src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

_tile_dpfp16ps
^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
    FP32 dst, 
    FP16 a, 
    FP16 b

.. code-block:: C

    void _tile_dpfp16ps(constexpr int dst, constexpr int a,
                        constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of FP16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
        			tmp.fp32[n] += FP32(a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

__tile_dpfp16ps
^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpfp16ps(__tile1024i* dst, __tile1024i src0,
                         __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of FP16 (16-bit) floating-point pairs in tiles "src0" and "src1", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+0])
        			tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+1])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

_tile_dpbsud
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
     dst, 
     a, 
     b

.. code-block:: C

    void _tile_dpbsud(constexpr int dst, constexpr int a,
                      constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
        	tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
        	tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
        	tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
        	
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_dpbusd
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
     dst, 
     a, 
     b

.. code-block:: C

    void _tile_dpbusd(constexpr int dst, constexpr int a,
                      constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := ZeroExtend32(x.byte[0]) * SignExtend32(y.byte[0])
        	tmp2 := ZeroExtend32(x.byte[1]) * SignExtend32(y.byte[1])
        	tmp3 := ZeroExtend32(x.byte[2]) * SignExtend32(y.byte[2])
        	tmp4 := ZeroExtend32(x.byte[3]) * SignExtend32(y.byte[3])
        	
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_dpbuud
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
     dst, 
     a, 
     b

.. code-block:: C

    void _tile_dpbuud(constexpr int dst, constexpr int a,
                      constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
        	tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
        	tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
        	tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
        	
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_dpbssd
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    constexpr int a, 
    constexpr int b
:Param ETypes:
     dst, 
     a, 
     b

.. code-block:: C

    void _tile_dpbssd(constexpr int dst, constexpr int a,
                      constexpr int b)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := SignExtend32(x.byte[0]) * SignExtend32(y.byte[0])
        	tmp2 := SignExtend32(x.byte[1]) * SignExtend32(y.byte[1])
        	tmp3 := SignExtend32(x.byte[2]) * SignExtend32(y.byte[2])
        	tmp4 := SignExtend32(x.byte[3]) * SignExtend32(y.byte[3])
        	
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (a.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

__tile_dpbssd
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpbssd(__tile1024i* dst, __tile1024i src0,
                       __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "src0" with corresponding signed 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := SignExtend32(x.byte[0]) * SignExtend32(y.byte[0])
        	tmp2 := SignExtend32(x.byte[1]) * SignExtend32(y.byte[1])
        	tmp3 := SignExtend32(x.byte[2]) * SignExtend32(y.byte[2])
        	tmp4 := SignExtend32(x.byte[3]) * SignExtend32(y.byte[3])
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_dpbsud
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpbsud(__tile1024i* dst, __tile1024i src0,
                       __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "src0" with corresponding unsigned 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
        	tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
        	tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
        	tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_dpbusd
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpbusd(__tile1024i* dst, __tile1024i src0,
                       __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "src0" with corresponding signed 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := ZeroExtend32(x.byte[0]) * SignExtend32(y.byte[0])
        	tmp2 := ZeroExtend32(x.byte[1]) * SignExtend32(y.byte[1])
        	tmp3 := ZeroExtend32(x.byte[2]) * SignExtend32(y.byte[2])
        	tmp4 := ZeroExtend32(x.byte[3]) * SignExtend32(y.byte[3])
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_dpbuud
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    __tile1024i src0, 
    __tile1024i src1
:Param ETypes:
     dst, 
     src0, 
     src1

.. code-block:: C

    void __tile_dpbuud(__tile1024i* dst, __tile1024i src0,                   __tile1024i src1)

.. admonition:: Intel Description

    Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "src0" with corresponding unsigned 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE DPBD(c, x, y) {
        	tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
        	tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
        	tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
        	tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
        	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
        }
        FOR m := 0 TO dst.rows - 1
        	tmp := dst.row[m]
        	FOR k := 0 TO (src0.colsb / 4) - 1
        		FOR n := 0 TO (dst.colsb / 4) - 1
        			tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
        		ENDFOR
        	ENDFOR
        	write_row_and_zero(dst, m, tmp, dst.colsb)
        ENDFOR
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

_tile_loadconfig
^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void

.. code-block:: C

    void _tile_loadconfig(const void * mem_addr);

.. admonition:: Intel Description

    Load tile configuration from a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If the specified pallette_id is zero, that signifies the init state for both the tile config and the tile data, and the tiles are zeroed. Any invalid configurations will result in #GP fault.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        //	format of memory payload. each field is a byte.
        //		 0: palette
        //		 1: start_row
        //	 2-15: reserved, must be zero
        //	16-17: tile0.colsb
        //	18-19: tile1.colsb
        //	20-21: tile2.colsb
        //			...
        //	30-31: tile7.colsb
        //	32-47: reserved, must be zero
        //		48: tile0.rows
        //		49: tile1.rows
        //		50: tile2.rows
        //			 ...
        //		55: tile7.rows
        //	56-63: reserved, must be zero
        	

_tile_storeconfig
^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void

.. code-block:: C

    void _tile_storeconfig(void * mem_addr);

.. admonition:: Intel Description

    Stores the current tile configuration to a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If tiles are not configured, all zeroes will be stored to memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        //	format of memory payload. each field is a byte.
        //		 0: palette
        //		 1: start_row
        //	 2-15: reserved, must be zero
        //	16-17: tile0.colsb
        //	18-19: tile1.colsb
        //	20-21: tile2.colsb
        //			...
        //	30-31: tile7.colsb
        //	32-47: reserved, must be zero
        //		48: tile0.rows
        //		49: tile1.rows
        //		50: tile2.rows
        //			 ...
        //		55: tile7.rows
        //	56-63: reserved, must be zero
        	

_tile_loadd
^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    const void * base, 
    size_t stride
:Param ETypes:
     dst, 
     base, 
    UI32 stride

.. code-block:: C

    void _tile_loadd(constexpr int dst, const void* base,
                     size_t stride)

.. admonition:: Intel Description

    Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        IF start == 0 // not restarting, zero incoming state
        	tilezero(dst)
        FI
        nbytes := dst.colsb
        DO WHILE start < dst.rows
        	memptr := base + start * stride
        	write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
        	start := start + 1
        OD
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_stream_loadd
^^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int dst, 
    const void * base, 
    size_t stride
:Param ETypes:
     dst, 
     base, 
    UI32 stride

.. code-block:: C

    void _tile_stream_loadd(constexpr int dst, const void* base,
                            size_t stride)

.. admonition:: Intel Description

    Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig". This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        IF start == 0 // not restarting, zero incoming state
        	tilezero(dst)
        FI
        nbytes := dst.colsb
        DO WHILE start < dst.rows
        	memptr := base + start * stride
        	write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
        	start := start + 1
        OD
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        	

_tile_release
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void

.. code-block:: C

    

.. admonition:: Intel Description

    Release the tile configuration to return to the init state, which releases all storage it currently holds.

_tile_stored
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    constexpr int src, 
    void * base, 
    size_t stride
:Param ETypes:
     src, 
     base, 
    UI32 stride

.. code-block:: C

    void _tile_stored(constexpr int src, void* base,
                      size_t stride)

.. admonition:: Intel Description

    Store the tile specified by "src" to memory specifieid by "base" address and "stride" using the tile configuration previously configured via "_tile_loadconfig".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        DO WHILE start < src.rows
        	memptr := base + start * stride
        	write_memory(memptr, src.colsb, src.row[start])
        	start := start + 1
        OD
        zero_tileconfig_start()
        	

_tile_zero
^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void

.. code-block:: C

    void _tile_zero(constexpr int tdest);

.. admonition:: Intel Description

    Zero the tile specified by "tdest".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        nbytes := palette_table[tileconfig.palette_id].bytes_per_row
        FOR i := 0 TO palette_table[tileconfig.palette_id].max_rows-1
        	FOR j := 0 TO nbytes-1
        		tdest.row[i].byte[j] := 0
        	ENDFOR
        ENDFOR
        	

__tile_loadd
^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    const void* base, 
    size_t stride
:Param ETypes:
     dst, 
     base, 
     stride

.. code-block:: C

    void __tile_loadd(__tile1024i* dst, const void* base,
                      size_t stride)

.. admonition:: Intel Description

    Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        IF start == 0 // not restarting, zero incoming state
        	tilezero(dst)
        FI
        nbytes := dst.colsb
        DO WHILE start < dst.rows
        	memptr := base + start * stride
        	write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
        	start := start + 1
        OD
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_stored
^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    void* base, 
    size_t stride, 
    __tile1024i src
:Param ETypes:
     base, 
     stride, 
     src

.. code-block:: C

    void __tile_stored(void* base, size_t stride,
                       __tile1024i src)

.. admonition:: Intel Description

    Store the tile specified by "src" to memory specifieid by "base" address and "stride". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        DO WHILE start < src.rows
        	memptr := base + start * stride
        	write_memory(memptr, src.colsb, src.row[start])
        	start := start + 1
        OD
        zero_tileconfig_start()
        

__tile_stream_loadd
^^^^^^^^^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void
:Param Types:
    __tile1024i* dst, 
    const void* base, 
    size_t stride
:Param ETypes:
     dst, 
     base, 
     stride

.. code-block:: C

    void __tile_stream_loadd(__tile1024i* dst, const void* base,
                             size_t stride)

.. admonition:: Intel Description

    Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst". This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly. The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        start := tileconfig.startRow
        IF start == 0 // not restarting, zero incoming state
        	tilezero(dst)
        FI
        nbytes := dst.colsb
        DO WHILE start < dst.rows
        	memptr := base + start * stride
        	write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
        	start := start + 1
        OD
        zero_upper_rows(dst, dst.rows)
        zero_tileconfig_start()
        

__tile_zero
^^^^^^^^^^^
:Tech: AMX
:Category: Application-Targeted
:Header: immintrin.h
:Searchable: AMX-Application-Targeted-Other
:Return Type: void

.. code-block:: C

    void __tile_zero(__tile1024i* dst);

.. admonition:: Intel Description

    Zero the tile specified by "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        nbytes := palette_table[tileconfig.palette_id].bytes_per_row
        FOR i := 0 TO palette_table[tileconfig.palette_id].max_rows-1
        	FOR j := 0 TO nbytes-1
        		tdest.row[i].byte[j] := 0
        	ENDFOR
        ENDFOR
        

AVX-512
=======
Shift
-----
ZMM
~~~
_mm512_bslli_epi128
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_bslli_epi128(__m512i a, int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] << (tmp*8)
        dst[255:128] := a[255:128] << (tmp*8)
        dst[383:256] := a[383:256] << (tmp*8)
        dst[511:384] := a[511:384] << (tmp*8)
        dst[MAX:512] := 0
        	

_mm512_mask_sllv_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_sllv_epi16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sllv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_sllv_epi16(__mmask32 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sllv_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_sllv_epi16(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sll_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_sll_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_slli_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_slli_epi16(__m512i src, __mmask32 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sll_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_sll_epi16(__mmask32 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_slli_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_slli_epi16(__mmask32 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sll_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_sll_epi16(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_slli_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_slli_epi16(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srav_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_srav_epi16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srav_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_srav_epi16(__mmask32 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srav_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    SI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_srav_epi16(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sra_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_sra_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srai_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srai_epi16(__m512i src, __mmask32 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sra_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_sra_epi16(__mmask32 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srai_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srai_epi16(__mmask32 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sra_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_sra_epi16(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srai_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srai_epi16(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_bsrli_epi128
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_bsrli_epi128(__m512i a, int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] >> (tmp*8)
        dst[255:128] := a[255:128] >> (tmp*8)
        dst[383:256] := a[383:256] >> (tmp*8)
        dst[511:384] := a[511:384] >> (tmp*8)
        dst[MAX:512] := 0
        	

_mm512_mask_srlv_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_srlv_epi16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srlv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_srlv_epi16(__mmask32 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srlv_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_srlv_epi16(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srl_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_mask_srl_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srli_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srli_epi16(__m512i src, __mmask32 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srl_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_maskz_srl_epi16(__mmask32 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srli_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srli_epi16(__mmask32 k, __m512i a,
                                    int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srl_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m512i _mm512_srl_epi16(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srli_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srli_epi16(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rol_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_rol_epi32(__m512i src, __mmask16 k,
                                  __m512i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rol_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_rol_epi32(__mmask16 k, __m512i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rol_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_rol_epi32(__m512i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rol_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_rol_epi64(__m512i src, __mmask8 k,
                                  __m512i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rol_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_rol_epi64(__mmask8 k, __m512i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rol_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_rol_epi64(__m512i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rolv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_rolv_epi32(__m512i src, __mmask16 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rolv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_rolv_epi32(__mmask16 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rolv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_rolv_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rolv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_rolv_epi64(__m512i src, __mmask8 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rolv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_rolv_epi64(__mmask8 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rolv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_rolv_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ror_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_ror_epi32(__m512i src, __mmask16 k,
                                  __m512i a, int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_ror_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_ror_epi32(__mmask16 k, __m512i a,
                                   int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ror_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_ror_epi32(__m512i a, int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ror_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_ror_epi64(__m512i src, __mmask8 k,
                                  __m512i a, int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_ror_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_ror_epi64(__mmask8 k, __m512i a,
                                   int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ror_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_ror_epi64(__m512i a, int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rorv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_rorv_epi32(__m512i src, __mmask16 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rorv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_rorv_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rorv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_rorv_epi64(__m512i src, __mmask8 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rorv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_rorv_epi64(__mmask8 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rorv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_rorv_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sll_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_sll_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sll_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_sll_epi32(__mmask16 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_slli_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_slli_epi32(__mmask16 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sll_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_sll_epi32(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sll_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_sll_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_slli_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_slli_epi64(__m512i src, __mmask8 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sll_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_sll_epi64(__mmask8 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_slli_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_slli_epi64(__mmask8 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sll_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_sll_epi64(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_slli_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_slli_epi64(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sllv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_sllv_epi32(__mmask16 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sllv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_sllv_epi64(__m512i src, __mmask8 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sllv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_sllv_epi64(__mmask8 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sllv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_sllv_epi64(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sra_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_sra_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sra_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_sra_epi32(__mmask16 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srai_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srai_epi32(__mmask16 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sra_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_sra_epi32(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sra_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_sra_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srai_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srai_epi64(__m512i src, __mmask8 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sra_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_sra_epi64(__mmask8 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srai_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srai_epi64(__mmask8 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sra_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_sra_epi64(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srai_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srai_epi64(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srav_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_srav_epi32(__mmask16 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srav_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_srav_epi64(__m512i src, __mmask8 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srav_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_srav_epi64(__mmask8 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srav_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    SI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_srav_epi64(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srl_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_srl_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srl_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_srl_epi32(__mmask16 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srli_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srli_epi32(__mmask16 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srl_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_srl_epi32(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srl_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_srl_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srli_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srli_epi64(__m512i src, __mmask8 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srl_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_srl_epi64(__mmask8 k, __m512i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srli_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_srli_epi64(__mmask8 k, __m512i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srl_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_srl_epi64(__m512i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srli_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srli_epi64(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srlv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_maskz_srlv_epi32(__mmask16 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srlv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_mask_srlv_epi64(__m512i src, __mmask8 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_srlv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_maskz_srlv_epi64(__mmask8 k, __m512i a,
                                    __m512i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srlv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m512i _mm512_srlv_epi64(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_slli_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_slli_epi32(__m512i src, __mmask16 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_slli_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_slli_epi32(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sllv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_sllv_epi32(__m512i src, __mmask16 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sllv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_sllv_epi32(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srai_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srai_epi32(__m512i src, __mmask16 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srai_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srai_epi32(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srav_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_srav_epi32(__m512i src, __mmask16 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srav_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    SI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_srav_epi32(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srli_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_srli_epi32(__m512i src, __mmask16 k,
                                   __m512i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srli_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    unsigned int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_srli_epi32(__m512i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_srlv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_mask_srlv_epi32(__m512i src, __mmask16 k,
                                   __m512i a, __m512i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_srlv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m512i _mm512_srlv_epi32(__m512i a, __m512i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_maskz_shrdv_epi64(__mmask8 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_mask_shrdv_epi64(__m512i a, __mmask8 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdv_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_shrdv_epi64(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_maskz_shrdv_epi32(__mmask16 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask16 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_mask_shrdv_epi32(__m512i a, __mmask16 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdv_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_shrdv_epi32(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_maskz_shrdv_epi16(__mmask32 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask32 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_mask_shrdv_epi16(__m512i a, __mmask32 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdv_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_shrdv_epi16(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shrdi_epi64(__mmask8 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shrdi_epi64(__m512i src, __mmask8 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdi_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shrdi_epi64(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shrdi_epi32(__mmask16 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shrdi_epi32(__m512i src, __mmask16 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdi_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shrdi_epi32(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shrdi_epi16(__mmask32 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shrdi_epi16(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shrdi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shrdi_epi16(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldv_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_maskz_shldv_epi64(__mmask8 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_mask_shldv_epi64(__m512i a, __mmask8 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldv_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_shldv_epi64(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldv_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_maskz_shldv_epi32(__mmask16 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask16 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_mask_shldv_epi32(__m512i a, __mmask16 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldv_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m512i _mm512_shldv_epi32(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldv_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_maskz_shldv_epi16(__mmask32 k, __m512i a,
                                     __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask32 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_mask_shldv_epi16(__m512i a, __mmask32 k,
                                    __m512i b, __m512i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldv_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m512i _mm512_shldv_epi16(__m512i a, __m512i b, __m512i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shldi_epi64(__mmask8 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldi_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shldi_epi64(__m512i src, __mmask8 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldi_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shldi_epi64(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shldi_epi32(__mmask16 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldi_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shldi_epi32(__m512i src, __mmask16 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldi_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shldi_epi32(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shldi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shldi_epi16(__mmask32 k, __m512i a,
                                     __m512i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shldi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shldi_epi16(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shldi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shldi_epi16(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_sllv_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_sllv_epi16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sllv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_sllv_epi16(__mmask16 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sllv_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_sllv_epi16(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sll_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_sll_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_slli_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_slli_epi16(__m256i src, __mmask16 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sll_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_sll_epi16(__mmask16 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_slli_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_slli_epi16(__mmask16 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srav_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_srav_epi16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srav_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_srav_epi16(__mmask16 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srav_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    SI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_srav_epi16(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sra_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_sra_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srai_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srai_epi16(__m256i src, __mmask16 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sra_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_sra_epi16(__mmask16 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srai_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srai_epi16(__mmask16 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srlv_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_srlv_epi16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srlv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_srlv_epi16(__mmask16 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srlv_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_srlv_epi16(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srl_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_mask_srl_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srli_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srli_epi16(__m256i src, __mmask16 k,
                                   __m256i a, int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srl_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_maskz_srl_epi16(__mmask16 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srli_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srli_epi16(__mmask16 k, __m256i a,
                                    int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rol_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_rol_epi32(__m256i src, __mmask8 k,
                                  __m256i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rol_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_rol_epi32(__mmask8 k, __m256i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rol_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_rol_epi32(__m256i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rol_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_rol_epi64(__m256i src, __mmask8 k,
                                  __m256i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rol_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_rol_epi64(__mmask8 k, __m256i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rol_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_rol_epi64(__m256i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rolv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_rolv_epi32(__m256i src, __mmask8 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rolv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_rolv_epi32(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rolv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_rolv_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rolv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_rolv_epi64(__m256i src, __mmask8 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rolv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_rolv_epi64(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rolv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_rolv_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_ror_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_ror_epi32(__m256i src, __mmask8 k,
                                  __m256i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_ror_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_ror_epi32(__mmask8 k, __m256i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ror_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_ror_epi32(__m256i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_ror_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_ror_epi64(__m256i src, __mmask8 k,
                                  __m256i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_ror_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_ror_epi64(__mmask8 k, __m256i a,
                                   const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ror_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_ror_epi64(__m256i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rorv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_rorv_epi32(__m256i src, __mmask8 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rorv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_rorv_epi32(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rorv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_rorv_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rorv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_rorv_epi64(__m256i src, __mmask8 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rorv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_rorv_epi64(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rorv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_rorv_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sll_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_sll_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_slli_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_slli_epi32(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sll_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_sll_epi32(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_slli_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_slli_epi32(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sll_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_sll_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_slli_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_slli_epi64(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sll_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_sll_epi64(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_slli_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_slli_epi64(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sllv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_sllv_epi32(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sllv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_sllv_epi32(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sllv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_sllv_epi64(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sllv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_sllv_epi64(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sra_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_sra_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srai_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srai_epi32(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sra_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_sra_epi32(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srai_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srai_epi32(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sra_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_sra_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srai_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srai_epi64(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sra_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_sra_epi64(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srai_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srai_epi64(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sra_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_sra_epi64(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srai_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srai_epi64(__m256i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srav_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_srav_epi32(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srav_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_srav_epi32(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srav_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_srav_epi64(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srav_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_srav_epi64(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srav_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    SI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_srav_epi64(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srl_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_srl_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srli_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srli_epi32(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srl_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_srl_epi32(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srli_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srli_epi32(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srl_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_srl_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srli_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_srli_epi64(__m256i src, __mmask8 k,
                                   __m256i a,
                                   unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srl_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_srl_epi64(__mmask8 k, __m256i a,
                                   __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srli_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_srli_epi64(__mmask8 k, __m256i a,
                                    unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srlv_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_mask_srlv_epi32(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srlv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_maskz_srlv_epi32(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_srlv_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_mask_srlv_epi64(__m256i src, __mmask8 k,
                                   __m256i a, __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_srlv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_maskz_srlv_epi64(__mmask8 k, __m256i a,
                                    __m256i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_maskz_shrdv_epi64(__mmask8 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_mask_shrdv_epi64(__m256i a, __mmask8 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdv_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_shrdv_epi64(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_maskz_shrdv_epi32(__mmask8 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_mask_shrdv_epi32(__m256i a, __mmask8 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdv_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_shrdv_epi32(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_maskz_shrdv_epi16(__mmask16 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask16 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_mask_shrdv_epi16(__m256i a, __mmask16 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdv_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_shrdv_epi16(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shrdi_epi64(__mmask8 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shrdi_epi64(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdi_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shrdi_epi64(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shrdi_epi32(__mmask8 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shrdi_epi32(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdi_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shrdi_epi32(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shrdi_epi16(__mmask16 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shrdi_epi16(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shrdi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shrdi_epi16(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldv_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_maskz_shldv_epi64(__mmask8 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldv_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_mask_shldv_epi64(__m256i a, __mmask8 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldv_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_shldv_epi64(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldv_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_maskz_shldv_epi32(__mmask8 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_mask_shldv_epi32(__m256i a, __mmask8 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldv_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m256i _mm256_shldv_epi32(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldv_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_maskz_shldv_epi16(__mmask16 k, __m256i a,
                                     __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldv_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask16 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_mask_shldv_epi16(__m256i a, __mmask16 k,
                                    __m256i b, __m256i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldv_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m256i _mm256_shldv_epi16(__m256i a, __m256i b, __m256i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shldi_epi64(__mmask8 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldi_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shldi_epi64(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldi_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shldi_epi64(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shldi_epi32(__mmask8 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldi_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shldi_epi32(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldi_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shldi_epi32(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shldi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shldi_epi16(__mmask16 k, __m256i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shldi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shldi_epi16(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shldi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shldi_epi16(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_sllv_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_sllv_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sllv_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_sllv_epi16(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sllv_epi16
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_sllv_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sll_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_sll_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_slli_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_slli_epi16(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sll_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_sll_epi16(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_slli_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_slli_epi16(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srav_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_srav_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srav_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    SI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_srav_epi16(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srav_epi16
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    SI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_srav_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sra_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_sra_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srai_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srai_epi16(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sra_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_sra_epi16(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srai_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srai_epi16(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        		ELSE
        			dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srlv_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_srlv_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srlv_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_srlv_epi16(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[i+15:i] < 16
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        		ELSE
        			dst[i+15:i] := 0
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srlv_epi16
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_srlv_epi16(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF count[i+15:i] < 16
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srl_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_mask_srl_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srli_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srli_epi16(__m128i src, __mmask8 k,
                                __m128i a, int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srl_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 count

.. code-block:: C

    __m128i _mm_maskz_srl_epi16(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF count[63:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srli_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srli_epi16(__mmask8 k, __m128i a,
                                 int imm8)

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		IF imm8[7:0] > 15
        			dst[i+15:i] := 0
        		ELSE
        			dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rol_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_rol_epi32(__m128i src, __mmask8 k,
                               __m128i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rol_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_rol_epi32(__mmask8 k, __m128i a,
                                const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rol_epi32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_rol_epi32(__m128i a, int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rol_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_rol_epi64(__m128i src, __mmask8 k,
                               __m128i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rol_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_rol_epi64(__mmask8 k, __m128i a,
                                const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rol_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_rol_epi64(__m128i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rolv_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_rolv_epi32(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rolv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_rolv_epi32(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rolv_epi32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_rolv_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src << count) OR (src >> (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rolv_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_rolv_epi64(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rolv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_rolv_epi64(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rolv_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_rolv_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src << count) OR (src >> (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_ror_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_ror_epi32(__m128i src, __mmask8 k,
                               __m128i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_ror_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_ror_epi32(__mmask8 k, __m128i a,
                                const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_ror_epi32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_ror_epi32(__m128i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_ror_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_ror_epi64(__m128i src, __mmask8 k,
                               __m128i a, const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_ror_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_ror_epi64(__mmask8 k, __m128i a,
                                const int imm8)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_ror_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_ror_epi64(__m128i a, const int imm8);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rorv_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_rorv_epi32(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rorv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_rorv_epi32(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rorv_epi32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_rorv_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rorv_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_rorv_epi64(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rorv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_rorv_epi64(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rorv_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_rorv_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
        	count := count_src % 64
        	RETURN (src >> count) OR (src << (64 - count))
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sll_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_sll_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_slli_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_slli_epi32(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sll_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_sll_epi32(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_slli_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_slli_epi32(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sll_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_sll_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_slli_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_slli_epi64(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sll_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_sll_epi64(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_slli_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_slli_epi64(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sllv_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_sllv_epi32(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sllv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_sllv_epi32(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sllv_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_sllv_epi64(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sllv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_sllv_epi64(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sra_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_sra_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srai_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srai_epi32(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sra_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_sra_epi32(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srai_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srai_epi32(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        		ELSE
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sra_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_sra_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srai_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srai_epi64(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sra_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_sra_epi64(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srai_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srai_epi64(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        		ELSE
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sra_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_sra_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srai_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    SI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_srai_epi64(__m128i a, unsigned int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
        	ELSE
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srav_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_srav_epi32(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srav_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    SI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_srav_epi32(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srav_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_srav_epi64(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srav_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    SI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_srav_epi64(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srav_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    SI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_srav_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srl_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_srl_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srli_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srli_epi32(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srl_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_srl_epi32(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[63:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srli_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srli_epi32(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF imm8[7:0] > 31
        			dst[i+31:i] := 0
        		ELSE
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srl_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_srl_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srli_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_srli_epi64(__m128i src, __mmask8 k,
                                __m128i a, unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srl_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_srl_epi64(__mmask8 k, __m128i a,
                                __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[63:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srli_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    unsigned int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_srli_epi64(__mmask8 k, __m128i a,
                                 unsigned int imm8)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF imm8[7:0] > 63
        			dst[i+63:i] := 0
        		ELSE
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srlv_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_mask_srlv_epi32(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srlv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_maskz_srlv_epi32(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF count[i+31:i] < 32
        			dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        		ELSE
        			dst[i+31:i] := 0
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_srlv_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_mask_srlv_epi64(__m128i src, __mmask8 k,
                                __m128i a, __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_srlv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i count
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_maskz_srlv_epi64(__mmask8 k, __m128i a,
                                 __m128i count)

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF count[i+63:i] < 64
        			dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        		ELSE
        			dst[i+63:i] := 0
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_maskz_shrdv_epi64(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_mask_shrdv_epi64(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdv_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_shrdv_epi64(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_maskz_shrdv_epi32(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_mask_shrdv_epi32(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdv_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_shrdv_epi32(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_maskz_shrdv_epi16(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdv_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_mask_shrdv_epi16(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdv_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_shrdv_epi16(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shrdi_epi64(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdi_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shrdi_epi64(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdi_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shrdi_epi64(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shrdi_epi32(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdi_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shrdi_epi32(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdi_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shrdi_epi32(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shrdi_epi16(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shrdi_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shrdi_epi16(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shrdi_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shrdi_epi16(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldv_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_maskz_shldv_epi64(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldv_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_mask_shldv_epi64(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldv_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_shldv_epi64(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldv_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_maskz_shldv_epi32(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldv_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_mask_shldv_epi32(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldv_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c

.. code-block:: C

    __m128i _mm_shldv_epi32(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldv_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_maskz_shldv_epi16(__mmask8 k, __m128i a,
                                  __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldv_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_mask_shldv_epi16(__m128i a, __mmask8 k,
                                 __m128i b, __m128i c)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldv_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI16 a, 
    UI16 b, 
    UI16 c

.. code-block:: C

    __m128i _mm_shldv_epi16(__m128i a, __m128i b, __m128i c);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldi_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shldi_epi64(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldi_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shldi_epi64(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        		dst[i+63:i] := tmp[127:64]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldi_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shldi_epi64(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
        	dst[i+63:i] := tmp[127:64]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldi_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shldi_epi32(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldi_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shldi_epi32(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        		dst[i+31:i] := tmp[63:32]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldi_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shldi_epi32(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
        	dst[i+31:i] := tmp[63:32]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shldi_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shldi_epi16(__mmask8 k, __m128i a,
                                  __m128i b, int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shldi_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shldi_epi16(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b,
                                 int imm8)

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_shldi_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Shift
:Header: immintrin.h
:Searchable: AVX-512-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_shldi_epi16(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:128] := 0
        	

Move
----
ZMM
~~~
_mm512_mask_mov_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_mov_epi16(__m512i src, __mmask32 k,
                                  __m512i a)

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_mov_epi16(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mov_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_mov_epi8(__m512i src, __mmask64 k,
                                 __m512i a)

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_mov_epi8(__mmask64 k, __m512i a);

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_mov_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_mov_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_movedup_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_movedup_pd(__m512d src, __mmask8 k,
                                   __m512d a)

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        tmp[191:128] := a[191:128]
        tmp[255:192] := a[191:128]
        tmp[319:256] := a[319:256] 
        tmp[383:320] := a[319:256] 
        tmp[447:384] := a[447:384]
        tmp[511:448] := a[447:384]
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_movedup_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_movedup_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        tmp[191:128] := a[191:128]
        tmp[255:192] := a[191:128]
        tmp[319:256] := a[319:256] 
        tmp[383:320] := a[319:256] 
        tmp[447:384] := a[447:384]
        tmp[511:448] := a[447:384]
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movedup_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_movedup_pd(__m512d a);

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := a[63:0]
        dst[191:128] := a[191:128]
        dst[255:192] := a[191:128]
        dst[319:256] := a[319:256]
        dst[383:320] := a[319:256]
        dst[447:384] := a[447:384]
        dst[511:448] := a[447:384]
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_mov_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mov_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_mov_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_movehdup_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_movehdup_ps(__m512 src, __mmask16 k,
                                   __m512 a)

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        tmp[159:128] := a[191:160] 
        tmp[191:160] := a[191:160] 
        tmp[223:192] := a[255:224] 
        tmp[255:224] := a[255:224]
        tmp[287:256] := a[319:288] 
        tmp[319:288] := a[319:288] 
        tmp[351:320] := a[383:352] 
        tmp[383:352] := a[383:352] 
        tmp[415:384] := a[447:416] 
        tmp[447:416] := a[447:416] 
        tmp[479:448] := a[511:480]
        tmp[511:480] := a[511:480]
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_movehdup_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_movehdup_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        tmp[159:128] := a[191:160] 
        tmp[191:160] := a[191:160] 
        tmp[223:192] := a[255:224] 
        tmp[255:224] := a[255:224]
        tmp[287:256] := a[319:288] 
        tmp[319:288] := a[319:288] 
        tmp[351:320] := a[383:352] 
        tmp[383:352] := a[383:352] 
        tmp[415:384] := a[447:416] 
        tmp[447:416] := a[447:416] 
        tmp[479:448] := a[511:480]
        tmp[511:480] := a[511:480]
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movehdup_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_movehdup_ps(__m512 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] 
        dst[63:32] := a[63:32] 
        dst[95:64] := a[127:96] 
        dst[127:96] := a[127:96]
        dst[159:128] := a[191:160] 
        dst[191:160] := a[191:160] 
        dst[223:192] := a[255:224] 
        dst[255:224] := a[255:224]
        dst[287:256] := a[319:288] 
        dst[319:288] := a[319:288] 
        dst[351:320] := a[383:352] 
        dst[383:352] := a[383:352] 
        dst[415:384] := a[447:416] 
        dst[447:416] := a[447:416] 
        dst[479:448] := a[511:480]
        dst[511:480] := a[511:480]
        dst[MAX:512] := 0
        	

_mm512_mask_moveldup_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_moveldup_ps(__m512 src, __mmask16 k,
                                   __m512 a)

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        tmp[159:128] := a[159:128] 
        tmp[191:160] := a[159:128] 
        tmp[223:192] := a[223:192] 
        tmp[255:224] := a[223:192]
        tmp[287:256] := a[287:256] 
        tmp[319:288] := a[287:256] 
        tmp[351:320] := a[351:320] 
        tmp[383:352] := a[351:320] 
        tmp[415:384] := a[415:384] 
        tmp[447:416] := a[415:384] 
        tmp[479:448] := a[479:448]
        tmp[511:480] := a[479:448]
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_moveldup_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_moveldup_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        tmp[159:128] := a[159:128] 
        tmp[191:160] := a[159:128] 
        tmp[223:192] := a[223:192] 
        tmp[255:224] := a[223:192]
        tmp[287:256] := a[287:256] 
        tmp[319:288] := a[287:256] 
        tmp[351:320] := a[351:320] 
        tmp[383:352] := a[351:320] 
        tmp[415:384] := a[415:384] 
        tmp[447:416] := a[415:384] 
        tmp[479:448] := a[479:448]
        tmp[511:480] := a[479:448]
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_moveldup_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_moveldup_ps(__m512 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] 
        dst[63:32] := a[31:0] 
        dst[95:64] := a[95:64] 
        dst[127:96] := a[95:64]
        dst[159:128] := a[159:128] 
        dst[191:160] := a[159:128] 
        dst[223:192] := a[223:192] 
        dst[255:224] := a[223:192]
        dst[287:256] := a[287:256] 
        dst[319:288] := a[287:256] 
        dst[351:320] := a[351:320] 
        dst[383:352] := a[351:320] 
        dst[415:384] := a[415:384] 
        dst[447:416] := a[415:384] 
        dst[479:448] := a[479:448]
        dst[511:480] := a[479:448]
        dst[MAX:512] := 0
        	

_mm512_mask_mov_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_mov_pd(__m512d src, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mov_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_mov_ps(__m512 src, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mov_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_mov_epi32(__m512i src, __mmask16 k,
                                  __m512i a)

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mov_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_mov_epi64(__m512i src, __mmask8 k,
                                  __m512i a)

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_mov_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_mov_epi16(__m256i src, __mmask16 k,
                                  __m256i a)

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_mov_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mov_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_mov_epi8(__m256i src, __mmask32 k,
                                 __m256i a)

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_mov_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mov_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_mov_pd(__m256d src, __mmask8 k,
                               __m256d a)

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_mov_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mov_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_mov_ps(__m256 src, __mmask8 k, __m256 a);

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_mov_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_movedup_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_movedup_pd(__m256d src, __mmask8 k,
                                   __m256d a)

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        tmp[191:128] := a[191:128]
        tmp[255:192] := a[191:128]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_movedup_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_movedup_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        tmp[191:128] := a[191:128]
        tmp[255:192] := a[191:128]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mov_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_mov_epi32(__m256i src, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_mov_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mov_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_mov_epi64(__m256i src, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mov_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_mov_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_movehdup_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_movehdup_ps(__m256 src, __mmask8 k,
                                   __m256 a)

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        tmp[159:128] := a[191:160] 
        tmp[191:160] := a[191:160] 
        tmp[223:192] := a[255:224] 
        tmp[255:224] := a[255:224]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_movehdup_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_movehdup_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        tmp[159:128] := a[191:160] 
        tmp[191:160] := a[191:160] 
        tmp[223:192] := a[255:224] 
        tmp[255:224] := a[255:224]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_moveldup_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_moveldup_ps(__m256 src, __mmask8 k,
                                   __m256 a)

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        tmp[159:128] := a[159:128] 
        tmp[191:160] := a[159:128] 
        tmp[223:192] := a[223:192] 
        tmp[255:224] := a[223:192]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_maskz_moveldup_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_moveldup_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        tmp[159:128] := a[159:128] 
        tmp[191:160] := a[159:128] 
        tmp[223:192] := a[223:192] 
        tmp[255:224] := a[223:192]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_mov_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_mov_epi16(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_mov_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mov_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_mov_epi8(__m128i src, __mmask16 k,
                              __m128i a)

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_mov_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mov_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_mov_pd(__m128d src, __mmask8 k, __m128d a);

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_mov_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mov_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_mov_ps(__m128 src, __mmask8 k, __m128 a);

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_mov_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_movedup_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_movedup_pd(__m128d src, __mmask8 k,
                                __m128d a)

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_movedup_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_movedup_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0]
        tmp[127:64] := a[63:0]
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mov_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_mov_epi32(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_mov_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mov_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_mov_epi64(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mov_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_mov_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_movehdup_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_movehdup_ps(__m128 src, __mmask8 k,
                                __m128 a)

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_movehdup_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_movehdup_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[63:32] 
        tmp[63:32] := a[63:32] 
        tmp[95:64] := a[127:96] 
        tmp[127:96] := a[127:96]
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_moveldup_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_moveldup_ps(__m128 src, __mmask8 k,
                                __m128 a)

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_maskz_moveldup_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_moveldup_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] 
        tmp[63:32] := a[31:0] 
        tmp[95:64] := a[95:64] 
        tmp[127:96] := a[95:64]
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_move_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_move_sd(__m128d src, __mmask8 k, __m128d a,
                             __m128d b)

.. admonition:: Intel Description

    Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_move_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_move_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_move_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_move_ss(__m128 src, __mmask8 k, __m128 a,
                            __m128 b)

.. admonition:: Intel Description

    Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_move_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_move_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_move_sh
^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_move_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_move_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_move_sh(__m128h src, __mmask8 k, __m128h a,
                             __m128h b)

.. admonition:: Intel Description

    Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_move_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Move
:Header: immintrin.h
:Searchable: AVX-512-Move-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_move_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

Bit Manipulation
----------------
ZMM
~~~
_mm512_lzcnt_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_lzcnt_epi32(__m512i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	tmp := 31
        	dst[i+31:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+31:i] := dst[i+31:i] + 1
        	OD
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_lzcnt_epi32(__m512i src, __mmask16 k,
                                    __m512i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_lzcnt_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_lzcnt_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_lzcnt_epi64(__m512i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp := 63
        	dst[i+63:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+63:i] := dst[i+63:i] + 1
        	OD
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_lzcnt_epi64(__m512i src, __mmask8 k,
                                    __m512i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_lzcnt_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_popcnt_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_popcnt_epi32(__m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := POPCNT(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_popcnt_epi32(__m512i src, __mmask16 k,
                                     __m512i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_popcnt_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_popcnt_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_popcnt_epi64(__m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := POPCNT(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_popcnt_epi64(__m512i src, __mmask8 k,
                                     __m512i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_popcnt_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask64 _mm512_mask_bitshuffle_epi64_mask(__mmask64 k,
                                                __m512i b,
                                                __m512i c)

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7 //Qword
        	FOR j := 0 to 7 // Byte
        		IF k[i*8+j]
        			m := c.qword[i].byte[j] & 0x3F
        			dst[i*8+j] := b.qword[i].bit[m]
        		ELSE
        			dst[i*8+j] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask64 _mm512_bitshuffle_epi64_mask(__m512i b,
                                           __m512i c)

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7 //Qword
        	FOR j := 0 to 7 // Byte
        		m := c.qword[i].byte[j] & 0x3F
        		dst[i*8+j] := b.qword[i].bit[m]
        	ENDFOR
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_popcnt_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512i _mm512_popcnt_epi16(__m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := POPCNT(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_popcnt_epi16(__m512i src, __mmask32 k,
                                     __m512i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_popcnt_epi16(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_popcnt_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_popcnt_epi8(__m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := POPCNT(a[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_popcnt_epi8(__m512i src, __mmask64 k,
                                    __m512i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_popcnt_epi8(__mmask64 k, __m512i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_multishift_epi64_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		dst[q+j*8+7:q+j*8] := tmp8[7:0]
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_multishift_epi64_epi8(__m512i src,
                                              __mmask64 k,
                                              __m512i a,
                                              __m512i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_multishift_epi64_epi8(__mmask64 k,
                                               __m512i a,
                                               __m512i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_lzcnt_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_lzcnt_epi32(__m256i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	tmp := 31
        	dst[i+31:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+31:i] := dst[i+31:i] + 1
        	OD
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_lzcnt_epi32(__m256i src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_lzcnt_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_lzcnt_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_lzcnt_epi64(__m256i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp := 63
        	dst[i+63:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+63:i] := dst[i+63:i] + 1
        	OD
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_lzcnt_epi64(__m256i src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_lzcnt_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_popcnt_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_popcnt_epi64(__m256i src, __mmask8 k,
                                     __m256i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_popcnt_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_popcnt_epi64(__m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := POPCNT(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_popcnt_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_popcnt_epi32(__m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := POPCNT(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_popcnt_epi32(__m256i src, __mmask8 k,
                                     __m256i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_popcnt_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask32 _mm256_mask_bitshuffle_epi64_mask(__mmask32 k,
                                                __m256i b,
                                                __m256i c)

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3 //Qword
        	FOR j := 0 to 7 // Byte
        		IF k[i*8+j]
        			m := c.qword[i].byte[j] & 0x3F
        			dst[i*8+j] := b.qword[i].bit[m]
        		ELSE
        			dst[i*8+j] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask32 _mm256_bitshuffle_epi64_mask(__m256i b,
                                           __m256i c)

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3 //Qword
        	FOR j := 0 to 7 // Byte
        		m := c.qword[i].byte[j] & 0x3F
        		dst[i*8+j] := b.qword[i].bit[m]
        	ENDFOR
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_popcnt_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm256_popcnt_epi16(__m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := POPCNT(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_popcnt_epi16(__m256i src, __mmask16 k,
                                     __m256i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_popcnt_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_popcnt_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_popcnt_epi8(__m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := POPCNT(a[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_popcnt_epi8(__m256i src, __mmask32 k,
                                    __m256i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_popcnt_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_multishift_epi64_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		dst[q+j*8+7:q+j*8] := tmp8[7:0]
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_multishift_epi64_epi8(__m256i src,
                                              __mmask32 k,
                                              __m256i a,
                                              __m256i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_multishift_epi64_epi8(__mmask32 k,
                                               __m256i a,
                                               __m256i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_lzcnt_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_lzcnt_epi32(__m128i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	tmp := 31
        	dst[i+31:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+31:i] := dst[i+31:i] + 1
        	OD
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_lzcnt_epi32(__m128i src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_lzcnt_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_lzcnt_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp := 31
        		dst[i+31:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+31:i] := dst[i+31:i] + 1
        		OD
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_lzcnt_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_lzcnt_epi64(__m128i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp := 63
        	dst[i+63:i] := 0
        	DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        		tmp := tmp - 1
        		dst[i+63:i] := dst[i+63:i] + 1
        	OD
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_lzcnt_epi64(__m128i src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_lzcnt_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_lzcnt_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp := 63
        		dst[i+63:i] := 0
        		DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
        			tmp := tmp - 1
        			dst[i+63:i] := dst[i+63:i] + 1
        		OD
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_popcnt_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_popcnt_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_popcnt_epi64(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := POPCNT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_popcnt_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_popcnt_epi64(__m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := POPCNT(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_popcnt_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_popcnt_epi32(__m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := POPCNT(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_popcnt_epi32(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_popcnt_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_popcnt_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := POPCNT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask16 _mm_mask_bitshuffle_epi64_mask(__mmask16 k,
                                             __m128i b,
                                             __m128i c)

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1 //Qword
        	FOR j := 0 to 7 // Byte
        		IF k[i*8+j]
        			m := c.qword[i].byte[j] & 0x3F
        			dst[i*8+j] := b.qword[i].bit[m]
        		ELSE
        			dst[i*8+j] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_bitshuffle_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 b, 
    UI64 c

.. code-block:: C

    __mmask16 _mm_bitshuffle_epi64_mask(__m128i b, __m128i c);

.. admonition:: Intel Description

    Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1 //Qword
        	FOR j := 0 to 7 // Byte
        		m := c.qword[i].byte[j] & 0x3F
        		dst[i*8+j] := b.qword[i].bit[m]
        	ENDFOR
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_popcnt_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_popcnt_epi16(__m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := POPCNT(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_popcnt_epi16(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_popcnt_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_popcnt_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := POPCNT(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_popcnt_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_popcnt_epi8(__m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := POPCNT(a[i+7:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_popcnt_epi8(__m128i src, __mmask16 k,
                                 __m128i a)

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_popcnt_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_popcnt_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE POPCNT(a) {
        	count := 0
        	DO WHILE a > 0
        		count += a[0]
        		a >>= 1
        	OD
        	RETURN count
        }
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := POPCNT(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_multishift_epi64_epi8(__m128i a, __m128i b);

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		dst[q+j*8+7:q+j*8] := tmp8[7:0]
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_multishift_epi64_epi8(__m128i src,
                                           __mmask16 k,
                                           __m128i a,
                                           __m128i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_multishift_epi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Bit Manipulation
:Header: immintrin.h
:Searchable: AVX-512-Bit Manipulation-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_multishift_epi64_epi8(__mmask16 k,
                                            __m128i a,
                                            __m128i b)

.. admonition:: Intel Description

    For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	q := i * 64
        	FOR j := 0 to 7
        		tmp8 := 0
        		ctrl := a[q+j*8+7:q+j*8] & 63
        		FOR l := 0 to 7
        			tmp8[l] := b[q+((ctrl+l) & 63)]
        		ENDFOR
        		IF k[i*8+j]
        			dst[q+j*8+7:q+j*8] := tmp8[7:0]
        		ELSE
        			dst[q+j*8+7:q+j*8] := 0
        		FI
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

Cast
----
ZMM
~~~
_mm512_castpd128_pd512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_castpd128_pd512(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd256_pd512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_castpd256_pd512(__m256d a);

.. admonition:: Intel Description

    Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd512_pd128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m128d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm512_castpd512_pd128(__m512d a);

.. admonition:: Intel Description

    Cast vector of type __m512d to type __m128d. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps512_ps128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m128
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm512_castps512_ps128(__m512 a);

.. admonition:: Intel Description

    Cast vector of type __m512 to type __m128. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd512_pd256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m256d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm512_castpd512_pd256(__m512d a);

.. admonition:: Intel Description

    Cast vector of type __m512d to type __m256d. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps128_ps512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_castps128_ps512(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps256_ps512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_castps256_ps512(__m256 a);

.. admonition:: Intel Description

    Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps512_ps256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm512_castps512_ps256(__m512 a);

.. admonition:: Intel Description

    Cast vector of type __m512 to type __m256. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi128_si512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    M512 a

.. code-block:: C

    __m512i _mm512_castsi128_si512(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. 
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi256_si512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    M512 a

.. code-block:: C

    __m512i _mm512_castsi256_si512(__m256i a);

.. admonition:: Intel Description

    Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined.
    	 This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi512_si128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    M128 a

.. code-block:: C

    __m128i _mm512_castsi512_si128(__m512i a);

.. admonition:: Intel Description

    Cast vector of type __m512i to type __m128i.
    	 This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi512_si256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    M256 a

.. code-block:: C

    __m256i _mm512_castsi512_si256(__m512i a);

.. admonition:: Intel Description

    Cast vector of type __m512i to type __m256i.
    	 This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextpd128_pd512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_zextpd128_pd512(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextps128_ps512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_zextps128_ps512(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m512; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextsi128_si512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    M512 a

.. code-block:: C

    __m512i _mm512_zextsi128_si512(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextpd256_pd512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_zextpd256_pd512(__m256d a);

.. admonition:: Intel Description

    Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextps256_ps512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_zextps256_ps512(__m256 a);

.. admonition:: Intel Description

    Cast vector of type __m256 to type __m512; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextsi256_si512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    M512 a

.. code-block:: C

    __m512i _mm512_zextsi256_si512(__m256i a);

.. admonition:: Intel Description

    Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512 _mm512_castpd_ps(__m512d a);

.. admonition:: Intel Description

    Cast vector of type __m512d to type __m512.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512i _mm512_castpd_si512(__m512d a);

.. admonition:: Intel Description

    Cast vector of type __m512d to type __m512i.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512d _mm512_castps_pd(__m512 a);

.. admonition:: Intel Description

    Cast vector of type __m512 to type __m512d.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_castps_si512(__m512 a);

.. admonition:: Intel Description

    Cast vector of type __m512 to type __m512i.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi512_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512d _mm512_castsi512_pd(__m512i a);

.. admonition:: Intel Description

    Cast vector of type __m512i to type __m512d.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi512_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512 _mm512_castsi512_ps(__m512i a);

.. admonition:: Intel Description

    Cast vector of type __m512i to type __m512.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512 _mm512_castph_ps(__m512h a);

.. admonition:: Intel Description

    Cast vector of type "__m512h" to type "__m512". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512d _mm512_castph_pd(__m512h a);

.. admonition:: Intel Description

    Cast vector of type "__m512h" to type "__m512d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_castph_si512(__m512h a);

.. admonition:: Intel Description

    Cast vector of type "__m512h" to type "__m512i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castps_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512h _mm512_castps_ph(__m512 a);

.. admonition:: Intel Description

    Cast vector of type "__m512" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castpd_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512h _mm512_castpd_ph(__m512d a);

.. admonition:: Intel Description

    Cast vector of type "__m512d" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castsi512_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512h _mm512_castsi512_ph(__m512i a);

.. admonition:: Intel Description

    Cast vector of type "__m512i" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph512_ph128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm512_castph512_ph128(__m512h a);

.. admonition:: Intel Description

    Cast vector of type "__m512h" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph512_ph256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm512_castph512_ph256(__m512h a);

.. admonition:: Intel Description

    Cast vector of type "__m512h" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph128_ph512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_castph128_ph512(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_castph256_ph512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_castph256_ph512(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextph128_ph512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_zextph128_ph512(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m512h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm512_zextph256_ph512
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_zextph256_ph512(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m512h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

YMM
~~~
_mm256_castph_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256 _mm256_castph_ps(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m256". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castph_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256d _mm256_castph_pd(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m256d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castph_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_castph_si256(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m256i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castps_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256h _mm256_castps_ph(__m256 a);

.. admonition:: Intel Description

    Cast vector of type "__m256" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castpd_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256h _mm256_castpd_ph(__m256d a);

.. admonition:: Intel Description

    Cast vector of type "__m256d" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castsi256_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256h _mm256_castsi256_ph(__m256i a);

.. admonition:: Intel Description

    Cast vector of type "__m256i" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castph256_ph128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm256_castph256_ph128(__m256h a);

.. admonition:: Intel Description

    Cast vector of type "__m256h" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castph128_ph256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_castph128_ph256(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_zextph128_ph256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_zextph128_ph256(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m256h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

XMM
~~~
_mm_castph_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128 _mm_castph_ps(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m128". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castph_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128d _mm_castph_pd(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m128d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castph_si128
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_castph_si128(__m128h a);

.. admonition:: Intel Description

    Cast vector of type "__m128h" to type "__m128i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castps_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128h _mm_castps_ph(__m128 a);

.. admonition:: Intel Description

    Cast vector of type "__m128" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castpd_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128h _mm_castpd_ph(__m128d a);

.. admonition:: Intel Description

    Cast vector of type "__m128d" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm_castsi128_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Cast
:Header: immintrin.h
:Searchable: AVX-512-Cast-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128h _mm_castsi128_ph(__m128i a);

.. admonition:: Intel Description

    Cast vector of type "__m128i" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

General Support
---------------
ZMM
~~~
_mm512_undefined
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-ZMM
:Register: ZMM 512 bit
:Return Type: __m512

.. code-block:: C

    __m512 _mm512_undefined(void );

.. admonition:: Intel Description

    Return vector of type __m512 with undefined elements.

_mm512_undefined_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512i with undefined elements.

_mm512_undefined_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512d with undefined elements.

_mm512_undefined_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-ZMM
:Register: ZMM 512 bit
:Return Type: __m512

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512 with undefined elements.

_mm512_undefined_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512h with undefined elements.

YMM
~~~
_mm256_undefined_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-YMM
:Register: YMM 256 bit
:Return Type: __m256h

.. code-block:: C

    __m256h _mm256_undefined_ph(void );

.. admonition:: Intel Description

    Return vector of type __m256h with undefined elements.

XMM
~~~
_mm_undefined_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: General Support
:Header: immintrin.h
:Searchable: AVX-512-General Support-XMM
:Register: XMM 128 bit
:Return Type: __m128h

.. code-block:: C

    __m128h _mm_undefined_ph(void );

.. admonition:: Intel Description

    Return vector of type __m128h with undefined elements.

Special Math Functions
----------------------
ZMM
~~~
_mm512_mask_max_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_max_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_max_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_max_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_max_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_max_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_max_round_pd(__m512d a, __m512d b, int sae);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst".  [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_max_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_max_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_max_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_max_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_max_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_max_round_ps(__m512 a, __m512 b, int sae);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst".  [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_min_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_min_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_min_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_min_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_min_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m512d _mm512_min_round_pd(__m512d a, __m512d b, int sae);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst".  [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_min_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_min_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_min_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_min_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_min_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m512 _mm512_min_round_ps(__m512 a, __m512 b, int sae);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst".  [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_abs_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m512i _mm512_abs_epi32(__m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ABS(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m512i _mm512_mask_abs_epi32(__m512i src, __mmask16 k,
                                  __m512i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_abs_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m512i _mm512_maskz_abs_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_abs_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m512i _mm512_abs_epi64(__m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ABS(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m512i _mm512_mask_abs_epi64(__m512i src, __mmask8 k,
                                  __m512i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_abs_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m512i _mm512_maskz_abs_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_maskz_max_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0 
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_mask_max_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_maskz_max_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_max_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_max_epu32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_max_epu64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_max_epu64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_max_epu64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_maskz_min_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_mask_min_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_maskz_min_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m512i _mm512_min_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_min_epu32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_min_epu64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_min_epu64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_min_epu64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_max_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_max_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_max_epu32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_max_epu32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_min_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_min_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_min_epu32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_min_epu32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_max_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    int _mm512_mask_reduce_max_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := Int32(-0x80000000)
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MAX(tmp, 16)
        	

_mm512_mask_reduce_max_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_max_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := Int64(-0x8000000000000000)
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MAX(tmp, 8)
        	

_mm512_mask_reduce_max_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    unsigned int _mm512_mask_reduce_max_epu32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MAX(tmp, 16)
        	

_mm512_mask_reduce_max_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    unsigned __int64 _mm512_mask_reduce_max_epu64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MAX(tmp, 8)
        	

_mm512_mask_reduce_max_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    double _mm512_mask_reduce_max_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := Cast_FP64(0xFFEFFFFFFFFFFFFF)
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MAX(tmp, 8)
        	

_mm512_mask_reduce_max_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    float _mm512_mask_reduce_max_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := Cast_FP32(0xFF7FFFFF)
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MAX(tmp, 16)
        	

_mm512_mask_reduce_min_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    int _mm512_mask_reduce_min_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := Int32(0x7FFFFFFF)
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MIN(tmp, 16)
        	

_mm512_mask_reduce_min_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_min_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := Int64(0x7FFFFFFFFFFFFFFF)
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MIN(tmp, 8)
        	

_mm512_mask_reduce_min_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    unsigned int _mm512_mask_reduce_min_epu32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0xFFFFFFFF
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MIN(tmp, 16)
        	

_mm512_mask_reduce_min_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    unsigned __int64 _mm512_mask_reduce_min_epu64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 64-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0xFFFFFFFFFFFFFFFF
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MIN(tmp, 8)
        	

_mm512_mask_reduce_min_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    double _mm512_mask_reduce_min_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := Cast_FP64(0x7FEFFFFFFFFFFFFF)
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MIN(tmp, 8)
        	

_mm512_mask_reduce_min_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    float _mm512_mask_reduce_min_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := Cast_FP32(0x7F7FFFFF)
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MIN(tmp, 16)
        	

_mm512_reduce_max_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    int _mm512_reduce_max_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MAX(a, 16)
        	

_mm512_reduce_max_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __int64 _mm512_reduce_max_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MAX(a, 8)
        	

_mm512_reduce_max_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _mm512_reduce_max_epu32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MAX(a, 16)
        	

_mm512_reduce_max_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _mm512_reduce_max_epu64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MAX(a, 8)
        	

_mm512_reduce_max_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm512_reduce_max_pd(__m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MAX(a, 8)
        	

_mm512_reduce_max_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm512_reduce_max_ps(__m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MAX(a, 16)
        	

_mm512_reduce_min_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    int _mm512_reduce_min_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MIN(a, 16)
        	

_mm512_reduce_min_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __int64 _mm512_reduce_min_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed signed 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MIN(a, 8)
        	

_mm512_reduce_min_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    unsigned int _mm512_reduce_min_epu32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MIN(a, 16)
        	

_mm512_reduce_min_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: unsigned __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    unsigned __int64 _mm512_reduce_min_epu64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MIN(a, 8)
        	

_mm512_reduce_min_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm512_reduce_min_pd(__m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MIN(a, 8)
        	

_mm512_reduce_min_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm512_reduce_min_ps(__m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MIN(a, 16)
        	

_mm512_max_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_max_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_max_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_max_ph(__mmask32 k, __m512h a,
                                __m512h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_max_round_ph(__m512h a, __m512h b, int sae);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_max_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_max_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_min_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_min_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_min_ph(__mmask32 k, __m512h a,                            __m512h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_min_round_ph(__m512h a, __m512h b, int sae);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note] [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_min_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_min_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_reduce_max_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_max_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MAX(a, 16)
        	

_mm256_mask_reduce_max_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_max_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := Int16(-0x8000)
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MAX(tmp, 16)
        	

_mm256_reduce_max_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_max_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MAX(a, 32)
        	

_mm256_mask_reduce_max_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_max_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := Int8(-0x80)
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MAX(tmp, 32)
        	

_mm256_reduce_max_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned short
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    unsigned short _mm256_reduce_max_epu16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MAX(a, 16)
        	

_mm256_mask_reduce_max_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    unsigned short _mm256_mask_reduce_max_epu16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MAX(tmp, 16)
        	

_mm256_reduce_max_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned char
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    unsigned char _mm256_reduce_max_epu8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MAX(a, 32)
        	

_mm256_mask_reduce_max_epu8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    unsigned char _mm256_mask_reduce_max_epu8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MAX(tmp, 32)
        	

_mm256_reduce_min_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_min_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MIN(a, 16)
        	

_mm256_mask_reduce_min_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_min_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := Int16(0x7FFF)
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MIN(tmp, 16)
        	

_mm256_reduce_min_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_min_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MIN(a, 32)
        	

_mm256_mask_reduce_min_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_min_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := Int8(0x7F)
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MIN(tmp, 32)
        	

_mm256_reduce_min_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned short
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    unsigned short _mm256_reduce_min_epu16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MIN(a, 16)
        	

_mm256_mask_reduce_min_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    unsigned short _mm256_mask_reduce_min_epu16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0xFFFF
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MIN(tmp, 16)
        	

_mm256_reduce_min_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned char
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    unsigned char _mm256_reduce_min_epu8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MIN(a, 32)
        	

_mm256_mask_reduce_min_epu8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: unsigned char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    unsigned char _mm256_mask_reduce_min_epu8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0xFF
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MIN(tmp, 16)
        	

_mm256_max_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_max_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_max_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_max_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_min_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_min_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_min_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_reduce_max_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_max_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MAX(a, 8)
        	

_mm_mask_reduce_max_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_max_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := Int16(-0x8000)
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MAX(tmp, 8)
        	

_mm_reduce_max_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_max_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MAX(a, 16)
        	

_mm_mask_reduce_max_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_max_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := Int8(-0x80)
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MAX(tmp, 16)
        	

_mm_reduce_max_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned short
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    unsigned short _mm_reduce_max_epu16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MAX(a, 8)
        	

_mm_mask_reduce_max_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    unsigned short _mm_mask_reduce_max_epu16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MAX(tmp, 8)
        	

_mm_reduce_max_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    unsigned char _mm_reduce_max_epu8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MAX(a, 16)
        	

_mm_mask_reduce_max_epu8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    unsigned char _mm_mask_reduce_max_epu8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MAX(src, len) {
        	IF len == 2
        		RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MAX(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MAX(tmp, 16)
        	

_mm_reduce_min_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_min_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MIN(a, 8)
        	

_mm_mask_reduce_min_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_min_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := Int16(0x7FFF)
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MIN(tmp, 8)
        	

_mm_reduce_min_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_min_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MIN(a, 16)
        	

_mm_mask_reduce_min_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_min_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed signed 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := Int8(0x7F)
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MIN(tmp, 16)
        	

_mm_reduce_min_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned short
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    unsigned short _mm_reduce_min_epu16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MIN(a, 8)
        	

_mm_mask_reduce_min_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    unsigned short _mm_mask_reduce_min_epu16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0xFFFF
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MIN(tmp, 8)
        	

_mm_reduce_min_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    unsigned char _mm_reduce_min_epu8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MIN(a, 16)
        	

_mm_mask_reduce_min_epu8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: unsigned char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    unsigned char _mm_mask_reduce_min_epu8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed unsigned 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MIN(src, len) {
        	IF len == 2
        		RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
        	ENDFOR
        	RETURN REDUCE_MIN(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0xFF
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MIN(tmp, 16)
        	

_mm_mask_max_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_max_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MAX(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_max_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_max_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MAX(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_max_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_max_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MAX(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_max_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_max_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MAX(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_max_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_max_round_sd(__m128d a, __m128d b, int sae);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MAX(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_max_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_max_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b, int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MAX(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_max_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_max_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MAX(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_max_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_max_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MAX(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_max_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_max_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MAX(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_max_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_max_round_ss(__m128 a, __m128 b, int sae);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MAX(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_min_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_min_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MIN(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_min_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_min_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MIN(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_min_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_min_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MIN(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_min_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_min_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MIN(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_min_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_min_round_sd(__m128d a, __m128d b, int sae);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" , and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := MIN(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_min_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_min_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b, int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MIN(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_min_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_min_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MIN(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_min_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_min_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MIN(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_min_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_min_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MIN(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_min_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_min_round_ss(__m128 a, __m128 b, int sae);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := MIN(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_max_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_max_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_max_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_max_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_max_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_max_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_max_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_max_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_max_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_max_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_max_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_max_round_sh(__m128h a, __m128h b, int sae);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_max_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_max_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int sae)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_max_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_max_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int sae)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_min_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_min_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_min_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_min_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_min_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_min_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_min_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_min_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_min_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_min_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_min_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_min_round_sh(__m128h a, __m128h b, int sae);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_min_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_min_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int sae)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_min_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_min_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int sae)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_reduce_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_reduce_sh(__m128h a, __m128h b, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_reduce_round_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_reduce_round_sh(__m128h a, __m128h b, int imm8,
                                const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_mask_reduce_sh(__m128h src, __mmask8 k,
                               __m128h a, __m128h b, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        IF k[0]
        	dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_reduce_round_sh(__m128h src, __mmask8 k,
                                     __m128h a, __m128h b,
                                     int imm8, const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        IF k[0]
        	dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_maskz_reduce_sh(__mmask8 k, __m128h a,
                                __m128h b, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        IF k[0]
        	dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Special Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_reduce_round_sh(__mmask8 k, __m128h a,
                                      __m128h b, int imm8,
                                      const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        IF k[0]
        	dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

Logical
-------
ZMM
~~~
_mm512_andnot_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_andnot_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_andnot_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_andnot_pd(__m512d src, __mmask8 k,
                                  __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_andnot_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_andnot_pd(__mmask8 k, __m512d a,
                                   __m512d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_andnot_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_andnot_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_andnot_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_andnot_ps(__m512 src, __mmask16 k,
                                 __m512 a, __m512 b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_andnot_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_andnot_ps(__mmask16 k, __m512 a,
                                  __m512 b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_and_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_and_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_and_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_and_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_and_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_and_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_and_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_and_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_and_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_and_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_and_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_and_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_or_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_or_pd(__m512d src, __mmask8 k,
                              __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_or_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_or_pd(__mmask8 k, __m512d a,
                               __m512d b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_or_pd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_or_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_or_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_or_ps(__m512 src, __mmask16 k, __m512 a,
                             __m512 b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_or_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_or_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_or_ps
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_or_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_xor_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_xor_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_xor_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_xor_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_xor_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_xor_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_xor_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_xor_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_xor_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_xor_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_xor_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_xor_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_and_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_and_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_andnot_epi32(__mmask16 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_andnot_epi64(__mmask8 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_and_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_and_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_or_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_or_epi32(__mmask16 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_or_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_or_epi64(__mmask8 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask16 k, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_ternarylogic_epi32(__m512i a,
                                           __mmask16 k,
                                           __m512i b, __m512i c,
                                           int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_ternarylogic_epi32(
        __mmask16 k, __m512i a, __m512i b, __m512i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_ternarylogic_epi32(__m512i a, __m512i b,
                                      __m512i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 15
        	i := j*32
        	FOR h := 0 to 31
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_ternarylogic_epi64(__m512i a,
                                           __mmask8 k,
                                           __m512i b, __m512i c,
                                           int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_ternarylogic_epi64(
        __mmask8 k, __m512i a, __m512i b, __m512i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_ternarylogic_epi64(__m512i a, __m512i b,
                                      __m512i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*64
        	FOR h := 0 to 63
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_test_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_test_epi64_mask(__mmask8 k1, __m512i a,
                                         __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_test_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_test_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_testn_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_testn_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_testn_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_testn_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_maskz_xor_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_xor_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_xor_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_xor_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_and_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_and_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_and_si512
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    M512 a, 
    M512 b

.. code-block:: C

    __m512i _mm512_and_si512(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := (a[511:0] AND b[511:0])
        dst[MAX:512] := 0
        	

_mm512_andnot_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_andnot_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_andnot_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    M512 a, 
    M512 b

.. code-block:: C

    __m512i _mm512_andnot_si512(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 512 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := ((NOT a[511:0]) AND b[511:0])
        dst[MAX:512] := 0
        	

_mm512_mask_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_andnot_epi32(__m512i src, __mmask16 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_andnot_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_andnot_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 512 bits (composed of packed 64-bit integers) in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := ((NOT a[511:0]) AND b[511:0])
        dst[MAX:512] := 0
        	

_mm512_mask_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_andnot_epi64(__m512i src, __mmask8 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_and_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_and_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 512 bits (composed of packed 64-bit integers) in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := (a[511:0] AND b[511:0])
        dst[MAX:512] := 0
        	

_mm512_mask_and_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_and_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_or_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_or_epi32(__m512i src, __mmask16 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_or_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_or_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_or_si512
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    M512 a, 
    M512 b

.. code-block:: C

    __m512i _mm512_or_si512(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise OR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := (a[511:0] OR b[511:0])
        dst[MAX:512] := 0
        	

_mm512_mask_or_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_or_epi64(__m512i src, __mmask8 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_or_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_or_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the resut in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_test_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_test_epi32_mask(__mmask16 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_test_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_test_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_xor_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_xor_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_xor_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_xor_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_xor_si512
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    M512 a, 
    M512 b

.. code-block:: C

    __m512i _mm512_xor_si512(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := (a[511:0] XOR b[511:0])
        dst[MAX:512] := 0
        	

_mm512_mask_xor_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_xor_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_xor_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_xor_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_and_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    int _mm512_mask_reduce_and_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[31:0] AND src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] AND src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0xFFFFFFFF
        	FI
        ENDFOR
        dst[31:0] := REDUCE_AND(tmp, 16)
        	

_mm512_mask_reduce_and_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_and_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[63:0] AND src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] AND src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0xFFFFFFFFFFFFFFFF
        	FI
        ENDFOR
        dst[63:0] := REDUCE_AND(tmp, 8)
        	

_mm512_mask_reduce_or_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    int _mm512_mask_reduce_or_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[31:0] OR src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] OR src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0
        	FI
        ENDFOR
        dst[31:0] := REDUCE_OR(tmp, 16)
        	

_mm512_mask_reduce_or_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_or_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[63:0] OR src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] OR src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0
        	FI
        ENDFOR
        dst[63:0] := REDUCE_OR(tmp, 8)
        	

_mm512_reduce_and_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm512_reduce_and_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[31:0] AND src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] AND src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_AND(a, 16)
        	

_mm512_reduce_and_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm512_reduce_and_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[63:0] AND src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] AND src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_AND(a, 8)
        	

_mm512_reduce_or_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm512_reduce_or_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[31:0] OR src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] OR src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_OR(a, 16)
        	

_mm512_reduce_or_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm512_reduce_or_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[63:0] OR src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] OR src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_OR(a, 8)
        	

_mm512_mask_and_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i v2, 
    __m512i v3
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 v2, 
    UI32 v3

.. code-block:: C

    __m512i _mm512_mask_and_epi32(__m512i src, __mmask16 k,
                                  __m512i v2, __m512i v3)

.. admonition:: Intel Description

    Performs element-by-element bitwise AND between packed 32-bit integer elements of "v2" and "v3", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := v2[i+31:i] & v3[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_andnot_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_andnot_pd(__m256d src, __mmask8 k,
                                  __m256d a, __m256d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_andnot_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_andnot_pd(__mmask8 k, __m256d a,
                                   __m256d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_andnot_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_andnot_ps(__m256 src, __mmask8 k,
                                 __m256 a, __m256 b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_andnot_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_andnot_ps(__mmask8 k, __m256 a,
                                  __m256 b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_and_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_and_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_and_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_and_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0 
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_and_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_and_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_and_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_and_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_or_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_or_pd(__m256d src, __mmask8 k,
                              __m256d a, __m256d b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_or_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_or_pd(__mmask8 k, __m256d a,
                               __m256d b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_or_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_or_ps(__m256 src, __mmask8 k, __m256 a,
                             __m256 b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_or_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_or_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_xor_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_xor_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_xor_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_xor_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_xor_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_xor_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_xor_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_xor_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_and_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_and_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_and_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_and_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_andnot_epi32(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_andnot_epi32(__mmask8 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_andnot_epi64(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_andnot_epi64(__mmask8 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_and_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_and_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_and_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_and_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_or_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_or_epi32(__m256i src, __mmask8 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_or_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_or_epi32(__mmask8 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_or_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_or_epi64(__m256i src, __mmask8 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_or_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_or_epi64(__mmask8 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_ternarylogic_epi32(__m256i a,
                                           __mmask8 k,
                                           __m256i b, __m256i c,
                                           int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_ternarylogic_epi32(
        __mmask8 k, __m256i a, __m256i b, __m256i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_ternarylogic_epi32(__m256i a, __m256i b,
                                      __m256i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 7
        	i := j*32
        	FOR h := 0 to 31
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_ternarylogic_epi64(__m256i a,
                                           __mmask8 k,
                                           __m256i b, __m256i c,
                                           int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_ternarylogic_epi64(
        __mmask8 k, __m256i a, __m256i b, __m256i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_ternarylogic_epi64(__m256i a, __m256i b,
                                      __m256i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*64
        	FOR h := 0 to 63
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_xor_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_xor_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_xor_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_xor_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_xor_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_xor_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_xor_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_xor_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_xor_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_xor_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_xor_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_xor_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_or_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_or_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_or_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_or_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_andnot_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_andnot_pd(__m128d src, __mmask8 k,
                               __m128d a, __m128d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_andnot_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_andnot_pd(__mmask8 k, __m128d a,
                                __m128d b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_andnot_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_andnot_ps(__m128 src, __mmask8 k, __m128 a,
                              __m128 b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_andnot_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_andnot_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_and_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_and_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_and_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_and_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_and_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_and_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_and_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_and_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_or_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_or_pd(__m128d src, __mmask8 k, __m128d a,
                           __m128d b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_or_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_or_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_or_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_or_ps(__m128 src, __mmask8 k, __m128 a,
                          __m128 b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_or_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_or_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_xor_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_xor_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_xor_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_xor_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_xor_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_xor_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_xor_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_xor_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_and_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_and_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_and_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_and_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_andnot_epi32(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_andnot_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_andnot_epi32(__mmask8 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_andnot_epi64(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_andnot_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_andnot_epi64(__mmask8 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_and_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_and_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_and_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_and_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] AND b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_or_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_or_epi32(__m128i src, __mmask8 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_or_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_or_epi32(__mmask8 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_or_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_or_epi64(__m128i src, __mmask8 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_or_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_or_epi64(__mmask8 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_ternarylogic_epi32(__m128i a, __mmask8 k,
                                        __m128i b, __m128i c,
                                        int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_ternarylogic_epi32(__mmask8 k, __m128i a,
                                         __m128i b, __m128i c,
                                         int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		FOR h := 0 to 31
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_ternarylogic_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_ternarylogic_epi32(__m128i a, __m128i b,
                                   __m128i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 3
        	i := j*32
        	FOR h := 0 to 31
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_ternarylogic_epi64(__m128i a, __mmask8 k,
                                        __m128i b, __m128i c,
                                        int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_ternarylogic_epi64(__mmask8 k, __m128i a,
                                         __m128i b, __m128i c,
                                         int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		FOR h := 0 to 63
        			dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        		ENDFOR
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_ternarylogic_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128i _mm_ternarylogic_epi64(__m128i a, __m128i b,
                                   __m128i c, int imm8)

.. admonition:: Intel Description

    Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE TernaryOP(imm8, a, b, c) {
        	CASE imm8[7:0] OF
        	0: dst[0] := 0                   // imm8[7:0] := 0
        	1: dst[0] := NOT (a OR b OR c)   // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
        	// ...
        	254: dst[0] := a OR b OR c       // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
        	255: dst[0] := 1                 // imm8[7:0] := 1
        	ESAC
        }
        imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
        FOR j := 0 to 1
        	i := j*64
        	FOR h := 0 to 63
        		dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
        	ENDFOR
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_xor_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_xor_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_xor_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_xor_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_xor_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_xor_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_xor_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_xor_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_xor_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_xor_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_xor_epi32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_xor_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_or_epi64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_or_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_or_epi32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Logical
:Header: immintrin.h
:Searchable: AVX-512-Logical-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_or_epi32(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        dst[MAX:128] := 0
        	

Swizzle
-------
ZMM
~~~
_mm512_mask_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_shuffle_epi8(__m512i src, __mmask64 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" within 128-bit lanes using the control in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[5:0] := b[i+3:i] + (j & 0x30)
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_shuffle_epi8(__mmask64 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[5:0] := b[i+3:i] + (j & 0x30)
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_shuffle_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF b[i+7] == 1
        		dst[i+7:i] := 0
        	ELSE
        		index[5:0] := b[i+3:i] + (j & 0x30)
        		dst[i+7:i] := a[index*8+7:index*8]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastmb_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_broadcastmb_epi64(__mmask8 k);

.. admonition:: Intel Description

    Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ZeroExtend64(k[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastmw_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_broadcastmw_epi32(__mmask16 k);

.. admonition:: Intel Description

    Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ZeroExtend32(k[15:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_compressstoreu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm512_mask_compressstoreu_pd(void* base_addr,
                                       __mmask8 k, __m512d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm512_mask_compressstoreu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm512_mask_compressstoreu_ps(void* base_addr,
                                       __mmask16 k, __m512 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm512_mask_compressstoreu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_compressstoreu_epi32(void* base_addr,
                                          __mmask16 k,
                                          __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm512_mask_compressstoreu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_compressstoreu_epi64(void* base_addr,
                                          __mmask8 k,
                                          __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm512_mask_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_mask_expandloadu_pd(__m512d src, __mmask8 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_maskz_expandloadu_pd(__mmask8 k,
                                        void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_mask_expandloadu_ps(__m512 src, __mmask16 k,
                                      void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_maskz_expandloadu_ps(__mmask16 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_mask_expandloadu_epi32(__m512i src,
                                          __mmask16 k,
                                          void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_expandloadu_epi32(
        __mmask16 k, void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_mask_expandloadu_epi64(__m512i src,
                                          __mmask8 k,
                                          void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_expandloadu_epi64(
        __mmask8 k, void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_broadcast_f32x4(__m128 a);

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_broadcast_f32x4(__m512 src, __mmask16 k,
                                       __m128 a)

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_broadcast_f32x4(__mmask16 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_f64x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_broadcast_f64x4(__m256d a);

.. admonition:: Intel Description

    Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_f64x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_broadcast_f64x4(__m512d src, __mmask8 k,
                                        __m256d a)

.. admonition:: Intel Description

    Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_f64x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_broadcast_f64x4(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_broadcast_i32x4(__m128i a);

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_broadcast_i32x4(__m512i src,
                                        __mmask16 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_broadcast_i32x4(__mmask16 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_i64x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_broadcast_i64x4(__m256i a);

.. admonition:: Intel Description

    Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_i64x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_broadcast_i64x4(__m512i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_i64x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_broadcast_i64x4(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 4)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_broadcastsd_pd(__m128d a);

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_broadcastsd_pd(__m512d src, __mmask8 k,
                                       __m128d a)

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_broadcastsd_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_broadcastss_ps(__m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_broadcastss_ps(__m512 src, __mmask16 k,
                                      __m128 a)

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_broadcastss_ps(__mmask16 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_compress_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_compress_pd(__m512d src, __mmask8 k,
                                    __m512d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_compress_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_compress_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_compress_ps(__m512 src, __mmask16 k,
                                   __m512 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_compress_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_expand_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_expand_pd(__m512d src, __mmask8 k,
                                  __m512d a)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_expand_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expand_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_expand_ps(__m512 src, __mmask16 k,
                                 __m512 a)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_expand_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128
:Param Types:
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm512_extractf32x4_ps(__m512 a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        2: dst[127:0] := a[383:256]
        3: dst[127:0] := a[511:384]
        ESAC
        dst[MAX:128] := 0
        	

_mm512_mask_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm512_mask_extractf32x4_ps(__m128 src, __mmask8 k,
                                       __m512 a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm512_maskz_extractf32x4_ps(__mmask8 k, __m512 a,
                                        int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_extractf64x4_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256d
:Param Types:
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm512_extractf64x4_pd(__m512d a, int imm8);

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[255:0] := a[255:0]
        1: dst[255:0] := a[511:256]
        ESAC
        dst[MAX:256] := 0
        	

_mm512_mask_extractf64x4_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm512_mask_extractf64x4_pd(__m256d src, __mmask8 k,
                                        __m512d a, int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_extractf64x4_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm512_maskz_extractf64x4_pd(__mmask8 k, __m512d a,
                                         int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_extracti32x4_epi32(__m512i a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        2: dst[127:0] := a[383:256]
        3: dst[127:0] := a[511:384]
        ESAC
        dst[MAX:128] := 0
        	

_mm512_mask_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_mask_extracti32x4_epi32(__m128i src,
                                           __mmask8 k,
                                           __m512i a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_maskz_extracti32x4_epi32(__mmask8 k,
                                            __m512i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_extracti64x4_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_extracti64x4_epi64(__m512i a, int imm8);

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[255:0] := a[255:0]
        1: dst[255:0] := a[511:256]
        ESAC
        dst[MAX:256] := 0
        	

_mm512_mask_extracti64x4_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_mask_extracti64x4_epi64(__m256i src,
                                           __mmask8 k,
                                           __m512i a, int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_extracti64x4_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_maskz_extracti64x4_epi64(__mmask8 k,
                                            __m512i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_insertf32x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_insertf32x4(__m512 a, __m128 b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        2: dst[383:256] := b[127:0]
        3: dst[511:384] := b[127:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_insertf32x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_insertf32x4(__m512 src, __mmask16 k,
                                   __m512 a, __m128 b,
                                   int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_insertf32x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_insertf32x4(__mmask16 k, __m512 a,
                                    __m128 b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_insertf64x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_insertf64x4(__m512d a, __m256d b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: dst[255:0] := b[255:0]
        1: dst[511:256] := b[255:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_insertf64x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_insertf64x4(__m512d src, __mmask8 k,
                                    __m512d a, __m256d b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_insertf64x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_insertf64x4(__mmask8 k, __m512d a,
                                     __m256d b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_inserti32x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_inserti32x4(__m512i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        2: dst[383:256] := b[127:0]
        3: dst[511:384] := b[127:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_inserti32x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_inserti32x4(__m512i src, __mmask16 k,
                                    __m512i a, __m128i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_inserti32x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_inserti32x4(__mmask16 k, __m512i a,
                                     __m128i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_inserti64x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_inserti64x4(__m512i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: dst[255:0] := b[255:0]
        1: dst[511:256] := b[255:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_inserti64x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_inserti64x4(__m512i src, __mmask8 k,
                                    __m512i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_inserti64x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_inserti64x4(__mmask8 k, __m512i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_broadcastd_epi32(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_broadcastd_epi32(__m512i src,
                                         __mmask16 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_broadcastd_epi32(__mmask16 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_broadcastq_epi64(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_broadcastq_epi64(__m512i src,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_broadcastq_epi64(__mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_compress_epi32(__m512i src, __mmask16 k,
                                       __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_compress_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_compress_epi64(__m512i src, __mmask8 k,
                                       __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_compress_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_permutexvar_epi32(__m512i src,
                                          __mmask16 k,
                                          __m512i idx,
                                          __m512i a)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_permutexvar_epi32(__mmask16 k,
                                           __m512i idx,
                                           __m512i a)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m512i _mm512_permutexvar_epi32(__m512i idx, __m512i a);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __mmask16 k, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    MASK k, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask2_permutex2var_epi32(__m512i a,
                                            __m512i idx,
                                            __mmask16 k,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask16 k, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_permutex2var_epi32(__m512i a,
                                           __mmask16 k,
                                           __m512i idx,
                                           __m512i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_permutex2var_epi32(__mmask16 k,
                                            __m512i a,
                                            __m512i idx,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m512i _mm512_permutex2var_epi32(__m512i a, __m512i idx,
                                      __m512i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512i idx, 
    __mmask8 k, 
    __m512d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    MASK k, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask2_permutex2var_pd(__m512d a, __m512i idx,
                                         __mmask8 k, __m512d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set)

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512i idx, 
    __m512d b
:Param ETypes:
    FP64 a, 
    MASK k, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_permutex2var_pd(__m512d a, __mmask8 k,
                                        __m512i idx, __m512d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512i idx, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_permutex2var_pd(__mmask8 k, __m512d a,
                                         __m512i idx,
                                         __m512d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512i idx, 
    __m512d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m512d _mm512_permutex2var_pd(__m512d a, __m512i idx,
                                   __m512d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512i idx, 
    __mmask16 k, 
    __m512 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    MASK k, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask2_permutex2var_ps(__m512 a, __m512i idx,
                                        __mmask16 k, __m512 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512i idx, 
    __m512 b
:Param ETypes:
    FP32 a, 
    MASK k, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_permutex2var_ps(__m512 a, __mmask16 k,
                                       __m512i idx, __m512 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512i idx, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_permutex2var_ps(__mmask16 k, __m512 a,
                                        __m512i idx, __m512 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512i idx, 
    __m512 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m512 _mm512_permutex2var_ps(__m512 a, __m512i idx,
                                  __m512 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	off := idx[i+3:i]*32
        	dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __mmask8 k, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    MASK k, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask2_permutex2var_epi64(__m512i a,
                                            __m512i idx,
                                            __mmask8 k,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_permutex2var_epi64(__m512i a,
                                           __mmask8 k,
                                           __m512i idx,
                                           __m512i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_permutex2var_epi64(__mmask8 k,
                                            __m512i a,
                                            __m512i idx,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m512i _mm512_permutex2var_epi64(__m512i a, __m512i idx,
                                      __m512i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	off := idx[i+2:i]*64
        	dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permute_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_permute_pd(__m512d src, __mmask8 k,
                                   __m512d a, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
        IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI
        IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI
        IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI
        IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI
        IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI
        IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI
        IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI
        IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512i b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m512d _mm512_mask_permutevar_pd(__m512d src, __mmask8 k,
                                      __m512d a, __m512i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
        IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI
        IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI
        IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI
        IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI
        IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI
        IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI
        IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI
        IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permute_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_permute_pd(__mmask8 k, __m512d a,
                                    const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
        IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI
        IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI
        IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI
        IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI
        IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI
        IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI
        IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI
        IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512i b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m512d _mm512_maskz_permutevar_pd(__mmask8 k, __m512d a,
                                       __m512i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
        IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI
        IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI
        IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI
        IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI
        IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI
        IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI
        IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI
        IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permute_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    const int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_permute_pd(__m512d a, const int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI
        IF (imm8[4] == 0) dst[319:256] := a[319:256]; FI
        IF (imm8[4] == 1) dst[319:256] := a[383:320]; FI
        IF (imm8[5] == 0) dst[383:320] := a[319:256]; FI
        IF (imm8[5] == 1) dst[383:320] := a[383:320]; FI
        IF (imm8[6] == 0) dst[447:384] := a[447:384]; FI
        IF (imm8[6] == 1) dst[447:384] := a[511:448]; FI
        IF (imm8[7] == 0) dst[511:448] := a[447:384]; FI
        IF (imm8[7] == 1) dst[511:448] := a[511:448]; FI
        dst[MAX:512] := 0
        	

_mm512_permutevar_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512i b
:Param ETypes:
    FP64 a, 
    UI64 b

.. code-block:: C

    __m512d _mm512_permutevar_pd(__m512d a, __m512i b);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) dst[255:192] := a[255:192]; FI
        IF (b[257] == 0) dst[319:256] := a[319:256]; FI
        IF (b[257] == 1) dst[319:256] := a[383:320]; FI
        IF (b[321] == 0) dst[383:320] := a[319:256]; FI
        IF (b[321] == 1) dst[383:320] := a[383:320]; FI
        IF (b[385] == 0) dst[447:384] := a[447:384]; FI
        IF (b[385] == 1) dst[447:384] := a[511:448]; FI
        IF (b[449] == 0) dst[511:448] := a[447:384]; FI
        IF (b[449] == 1) dst[511:448] := a[511:448]; FI
        dst[MAX:512] := 0
        	

_mm512_mask_permute_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_permute_ps(__m512 src, __mmask16 k,
                                  __m512 a, const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512i b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m512 _mm512_mask_permutevar_ps(__m512 src, __mmask16 k,
                                     __m512 a, __m512i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
        tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
        tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
        tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
        tmp_dst[287:256] := SELECT4(a[383:256], b[257:256])
        tmp_dst[319:288] := SELECT4(a[383:256], b[289:288])
        tmp_dst[351:320] := SELECT4(a[383:256], b[321:320])
        tmp_dst[383:352] := SELECT4(a[383:256], b[353:352])
        tmp_dst[415:384] := SELECT4(a[511:384], b[385:384])
        tmp_dst[447:416] := SELECT4(a[511:384], b[417:416])
        tmp_dst[479:448] := SELECT4(a[511:384], b[449:448])
        tmp_dst[511:480] := SELECT4(a[511:384], b[481:480])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permute_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_permute_ps(__mmask16 k, __m512 a,
                                   const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512i b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m512 _mm512_maskz_permutevar_ps(__mmask16 k, __m512 a,
                                      __m512i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
        tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
        tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
        tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
        tmp_dst[287:256] := SELECT4(a[383:256], b[257:256])
        tmp_dst[319:288] := SELECT4(a[383:256], b[289:288])
        tmp_dst[351:320] := SELECT4(a[383:256], b[321:320])
        tmp_dst[383:352] := SELECT4(a[383:256], b[353:352])
        tmp_dst[415:384] := SELECT4(a[511:384], b[385:384])
        tmp_dst[447:416] := SELECT4(a[511:384], b[417:416])
        tmp_dst[479:448] := SELECT4(a[511:384], b[449:448])
        tmp_dst[511:480] := SELECT4(a[511:384], b[481:480])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permute_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    const int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_permute_ps(__m512 a, const int imm8);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_permutevar_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512i b
:Param ETypes:
    FP32 a, 
    UI32 b

.. code-block:: C

    __m512 _mm512_permutevar_ps(__m512 a, __m512i b);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], b[1:0])
        dst[63:32] := SELECT4(a[127:0], b[33:32])
        dst[95:64] := SELECT4(a[127:0], b[65:64])
        dst[127:96] := SELECT4(a[127:0], b[97:96])
        dst[159:128] := SELECT4(a[255:128], b[129:128])
        dst[191:160] := SELECT4(a[255:128], b[161:160])
        dst[223:192] := SELECT4(a[255:128], b[193:192])
        dst[255:224] := SELECT4(a[255:128], b[225:224])
        dst[287:256] := SELECT4(a[383:256], b[257:256])
        dst[319:288] := SELECT4(a[383:256], b[289:288])
        dst[351:320] := SELECT4(a[383:256], b[321:320])
        dst[383:352] := SELECT4(a[383:256], b[353:352])
        dst[415:384] := SELECT4(a[511:384], b[385:384])
        dst[447:416] := SELECT4(a[511:384], b[417:416])
        dst[479:448] := SELECT4(a[511:384], b[449:448])
        dst[511:480] := SELECT4(a[511:384], b[481:480])
        dst[MAX:512] := 0
        	

_mm512_mask_permutex_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_permutex_pd(__m512d src, __mmask8 k,
                                    __m512d a, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i idx, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_permutexvar_pd(__m512d src, __mmask8 k,
                                       __m512i idx, __m512d a)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_permutex_pd(__mmask8 k, __m512d a,
                                     const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512i idx, 
    __m512d a
:Param ETypes:
    MASK k, 
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_permutexvar_pd(__mmask8 k, __m512i idx,
                                        __m512d a)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    const int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_permutex_pd(__m512d a, const int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i idx, 
    __m512d a
:Param ETypes:
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m512d _mm512_permutexvar_pd(__m512i idx, __m512d a);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	dst[i+63:i] := a[id+63:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i idx, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_permutexvar_ps(__m512 src, __mmask16 k,
                                      __m512i idx, __m512 a)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512i idx, 
    __m512 a
:Param ETypes:
    MASK k, 
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_permutexvar_ps(__mmask16 k, __m512i idx,
                                       __m512 a)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i idx, 
    __m512 a
:Param ETypes:
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m512 _mm512_permutexvar_ps(__m512i idx, __m512 a);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_permutex_epi64(__m512i src, __mmask8 k,
                                       __m512i a,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_permutexvar_epi64(__m512i src,
                                          __mmask8 k,
                                          __m512i idx,
                                          __m512i a)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_permutex_epi64(__mmask8 k, __m512i a,
                                        const int imm8)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_permutexvar_epi64(__mmask8 k,
                                           __m512i idx,
                                           __m512i a)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_permutex_epi64(__m512i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[319:256] := SELECT4(a[511:256], imm8[1:0])
        dst[383:320] := SELECT4(a[511:256], imm8[3:2])
        dst[447:384] := SELECT4(a[511:256], imm8[5:4])
        dst[511:448] := SELECT4(a[511:256], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m512i _mm512_permutexvar_epi64(__m512i idx, __m512i a);

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	id := idx[i+2:i]*64
        	dst[i+63:i] := a[id+63:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expand_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_expand_epi32(__m512i src, __mmask16 k,
                                     __m512i a)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_expand_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expand_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_expand_epi64(__m512i src, __mmask8 k,
                                     __m512i a)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_expand_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shuffle_epi32(__mmask16 k, __m512i a,
                                       _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_unpackhi_epi32(__m512i src, __mmask16 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_unpackhi_epi32(__mmask16 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_unpackhi_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_unpackhi_epi64(__m512i src, __mmask8 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_unpackhi_epi64(__mmask8 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_unpackhi_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_unpacklo_epi32(__m512i src, __mmask16 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_unpacklo_epi32(__mmask16 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_unpacklo_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_unpacklo_epi64(__m512i src, __mmask8 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_unpacklo_epi64(__mmask8 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_unpacklo_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_shuffle_f32x4(__m512 src, __mmask16 k,
                                     __m512 a, __m512 b,
                                     const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_shuffle_f32x4(__mmask16 k, __m512 a,
                                      __m512 b, const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_shuffle_f32x4(__m512 a, __m512 b,
                                const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_shuffle_f64x2(__m512d src, __mmask8 k,
                                      __m512d a, __m512d b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_shuffle_f64x2(__mmask8 k, __m512d a,
                                       __m512d b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_shuffle_f64x2(__m512d a, __m512d b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shuffle_i32x4(__m512i src, __mmask16 k,
                                      __m512i a, __m512i b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shuffle_i32x4(__mmask16 k, __m512i a,
                                       __m512i b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shuffle_i32x4(__m512i a, __m512i b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shuffle_i64x2(__m512i src, __mmask8 k,
                                      __m512i a, __m512i b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shuffle_i64x2(__mmask8 k, __m512i a,
                                       __m512i b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shuffle_i64x2(__m512i a, __m512i b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src[127:0]
        	1:	tmp[127:0] := src[255:128]
        	2:	tmp[127:0] := src[383:256]
        	3:	tmp[127:0] := src[511:384]
        	ESAC
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[511:0], imm8[1:0])
        dst[255:128] := SELECT4(a[511:0], imm8[3:2])
        dst[383:256] := SELECT4(b[511:0], imm8[5:4])
        dst[511:384] := SELECT4(b[511:0], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_shuffle_pd(__m512d src, __mmask8 k,
                                   __m512d a, __m512d b,
                                   const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
        tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
        tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
        tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_shuffle_pd(__mmask8 k, __m512d a,
                                    __m512d b, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
        tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
        tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
        tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_shuffle_pd(__m512d a, __m512d b,
                              const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
        dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
        dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
        dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_shuffle_ps(__m512 src, __mmask16 k,
                                  __m512 a, __m512 b,
                                  const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shuffle_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_shuffle_ps(__mmask16 k, __m512 a,
                                   __m512 b, const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_shuffle_ps(__m512 a, __m512 b,
                             const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        dst[351:320] := SELECT4(b[383:256], imm8[5:4])
        dst[383:352] := SELECT4(b[383:256], imm8[7:6])
        dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        dst[479:448] := SELECT4(b[511:384], imm8[5:4])
        dst[511:480] := SELECT4(b[511:384], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_unpackhi_pd(__m512d src, __mmask8 k,
                                    __m512d a, __m512d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_unpackhi_pd(__mmask8 k, __m512d a,
                                     __m512d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_unpackhi_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_unpackhi_ps(__m512 src, __mmask16 k,                               __m512 a, __m512 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_unpackhi_ps(__mmask16 k, __m512 a,
                                    __m512 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_unpackhi_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_unpacklo_pd(__m512d src, __mmask8 k,
                                    __m512d a, __m512d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_unpacklo_pd(__mmask8 k, __m512d a,
                                     __m512d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_unpacklo_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_unpacklo_ps(__m512 src, __mmask16 k,
                                   __m512 a, __m512 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_unpacklo_ps(__mmask16 k, __m512 a,
                                    __m512 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_unpacklo_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_blend_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_blend_pd(__mmask8 k, __m512d a,
                                 __m512d b)

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_blend_ps(__mmask16 k, __m512 a,
                                __m512 b)

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_blend_epi32(__mmask16 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_blend_epi64(__mmask8 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutevar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_permutevar_epi32(__m512i src,
                                         __mmask16 k,
                                         __m512i idx,
                                         __m512i a)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_mask_permutexvar_epi32", and it is recommended that you use that intrinsic name.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutevar_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m512i _mm512_permutevar_epi32(__m512i idx, __m512i a);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_permutexvar_epi32", and it is recommended that you use that intrinsic name.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	id := idx[i+3:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shuffle_epi32(__m512i src, __mmask16 k,
                                      __m512i a,
                                      _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shuffle_epi32(__m512i a, _MM_PERM_ENUM imm8);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        dst[287:256] := SELECT4(a[383:256], imm8[1:0])
        dst[319:288] := SELECT4(a[383:256], imm8[3:2])
        dst[351:320] := SELECT4(a[383:256], imm8[5:4])
        dst[383:352] := SELECT4(a[383:256], imm8[7:6])
        dst[415:384] := SELECT4(a[511:384], imm8[1:0])
        dst[447:416] := SELECT4(a[511:384], imm8[3:2])
        dst[479:448] := SELECT4(a[511:384], imm8[5:4])
        dst[511:480] := SELECT4(a[511:384], imm8[7:6])
        dst[MAX:512] := 0
        	

_mm512_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m512i _mm512_permutexvar_epi8(__m512i idx, __m512i a);

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	id := idx[i+5:i]*8
        	dst[i+7:i] := a[id+7:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_permutexvar_epi8(__m512i src,
                                         __mmask64 k,
                                         __m512i idx,
                                         __m512i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	id := idx[i+5:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_permutexvar_epi8(__mmask64 k,
                                          __m512i idx,
                                          __m512i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	id := idx[i+5:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m512i _mm512_permutex2var_epi8(__m512i a, __m512i idx,
                                     __m512i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	off := 8*idx[i+5:i]
        	dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask64 k, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI8 a, 
    MASK k, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_permutex2var_epi8(__m512i a,
                                          __mmask64 k,
                                          __m512i idx,
                                          __m512i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+5:i]
        		dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __mmask64 k, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    MASK k, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask2_permutex2var_epi8(__m512i a,
                                           __m512i idx,
                                           __mmask64 k,
                                           __m512i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+5:i]
        		dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := idx[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_permutex2var_epi8(__mmask64 k,
                                           __m512i a,
                                           __m512i idx,
                                           __m512i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+5:i]
        		dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_expandloadu_epi16(
        __mmask32 k, const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    const void* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m512i _mm512_mask_expandloadu_epi16(__m512i src,
                                          __mmask32 k,
                                          const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_expandloadu_epi8(__mmask64 k,
                                          const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    const void* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m512i _mm512_mask_expandloadu_epi8(__m512i src,
                                         __mmask64 k,
                                         const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_expand_epi16(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expand_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_expand_epi16(__m512i src, __mmask32 k,
                                     __m512i a)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_expand_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_expand_epi8(__mmask64 k, __m512i a);

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_expand_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_expand_epi8(__m512i src, __mmask64 k,
                                    __m512i a)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_compress_epi16(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_compress_epi16(__m512i src, __mmask32 k,                                   __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_maskz_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_compress_epi8(__mmask64 k, __m512i a);

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := 0
        dst[MAX:512] := 0
        	

_mm512_mask_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_compress_epi8(__m512i src, __mmask64 k,
                                      __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[511:m] := src[511:m]
        dst[MAX:512] := 0
        	

_mm512_mask_compressstoreu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm512_mask_compressstoreu_epi16(void* base_addr,
                                          __mmask32 k,
                                          __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := base_addr
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		MEM[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm512_mask_compressstoreu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm512_mask_compressstoreu_epi8(void* base_addr,
                                         __mmask64 k,
                                         __m512i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := base_addr
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		MEM[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        	

YMM
~~~
_mm256_mask_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_shuffle_epi8(__m256i src, __mmask32 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[4:0] := b[i+3:i] + (j & 0x10)
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_shuffle_epi8(__mmask32 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[4:0] := b[i+3:i] + (j & 0x10)
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m256i _mm256_permutexvar_epi8(__m256i idx, __m256i a);

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	id := idx[i+4:i]*8
        	dst[i+7:i] := a[id+7:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_permutexvar_epi8(__m256i src,
                                         __mmask32 k,
                                         __m256i idx,
                                         __m256i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	id := idx[i+4:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_permutexvar_epi8(__mmask32 k,
                                          __m256i idx,
                                          __m256i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	id := idx[i+4:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m256i _mm256_permutex2var_epi8(__m256i a, __m256i idx,
                                     __m256i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	off := 8*idx[i+4:i]
        	dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask32 k, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI8 a, 
    MASK k, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_permutex2var_epi8(__m256i a,
                                          __mmask32 k,
                                          __m256i idx,
                                          __m256i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+4:i]
        		dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __mmask32 k, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    MASK k, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask2_permutex2var_epi8(__m256i a,
                                           __m256i idx,
                                           __mmask32 k,
                                           __m256i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+4:i]
        		dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := idx[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_permutex2var_epi8(__mmask32 k,
                                           __m256i a,
                                           __m256i idx,
                                           __m256i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+4:i]
        		dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_expandloadu_epi16(
        __mmask16 k, const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    const void* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m256i _mm256_mask_expandloadu_epi16(__m256i src,
                                          __mmask16 k,
                                          const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_expandloadu_epi8(__mmask32 k,
                                          const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    const void* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m256i _mm256_mask_expandloadu_epi8(__m256i src,
                                         __mmask32 k,
                                         const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_expand_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expand_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_expand_epi16(__m256i src, __mmask16 k,
                                     __m256i a)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_expand_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expand_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_expand_epi8(__m256i src, __mmask32 k,
                                    __m256i a)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_compress_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_compress_epi16(__m256i src, __mmask16 k,
                                       __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_compress_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_compress_epi8(__m256i src, __mmask32 k,
                                      __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_mask_compressstoreu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm256_mask_compressstoreu_epi16(void* base_addr,
                                          __mmask16 k,
                                          __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := base_addr
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		MEM[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm256_mask_compressstoreu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm256_mask_compressstoreu_epi8(void* base_addr,
                                         __mmask32 k,
                                         __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := base_addr
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		MEM[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        	

XMM
~~~
_mm_mask_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_shuffle_epi8(__m128i src, __mmask16 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[3:0] := b[i+3:i]
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shuffle_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_shuffle_epi8(__mmask16 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		IF b[i+7] == 1
        			dst[i+7:i] := 0
        		ELSE
        			index[3:0] := b[i+3:i]
        			dst[i+7:i] := a[index*8+7:index*8]
        		FI
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i idx, 
    __m128i a
:Param ETypes:
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m128i _mm_permutexvar_epi8(__m128i idx, __m128i a);

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	id := idx[i+3:i]*8
        	dst[i+7:i] := a[id+7:id]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i idx, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_permutexvar_epi8(__m128i src, __mmask16 k,
                                      __m128i idx, __m128i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	id := idx[i+3:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutexvar_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i idx, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 idx, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_permutexvar_epi8(__mmask16 k, __m128i idx,
                                       __m128i a)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	id := idx[i+3:i]*8
        	IF k[j]
        		dst[i+7:i] := a[id+7:id]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m128i _mm_permutex2var_epi8(__m128i a, __m128i idx,
                                  __m128i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	off := 8*idx[i+3:i]
        	dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask16 k, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI8 a, 
    MASK k, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_permutex2var_epi8(__m128i a, __mmask16 k,
                                       __m128i idx, __m128i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+3:i]
        		dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __mmask16 k, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 idx, 
    MASK k, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask2_permutex2var_epi8(__m128i a, __m128i idx,
                                        __mmask16 k, __m128i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+3:i]
        		dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := idx[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 idx, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_permutex2var_epi8(__mmask16 k, __m128i a,
                                        __m128i idx, __m128i b)

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		off := 8*idx[i+3:i]
        		dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_maskz_expandloadu_epi16(__mmask8 k,
                                        const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    const void* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_mask_expandloadu_epi16(__m128i src, __mmask8 k,
                                       const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    const void* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m128i _mm_maskz_expandloadu_epi8(__mmask16 k,
                                       const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    const void* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m128i _mm_mask_expandloadu_epi8(__m128i src, __mmask16 k,
                                      const void* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_expand_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expand_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_expand_epi16(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[m+15:m]
        		m := m + 16
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_expand_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expand_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_expand_epi8(__m128i src, __mmask16 k,
                                 __m128i a)

.. admonition:: Intel Description

    Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[m+7:m]
        		m := m + 8
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_compress_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask_compress_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_compress_epi16(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := 0
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_maskz_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_compress_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask_compress_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_compress_epi8(__m128i src, __mmask16 k,
                                   __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := 0
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_mask_compressstoreu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm_mask_compressstoreu_epi16(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 16
        m := base_addr
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		MEM[m+size-1:m] := a[i+15:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm_mask_compressstoreu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX-512-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm_mask_compressstoreu_epi8(void* base_addr,
                                      __mmask16 k, __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 8
        m := base_addr
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		MEM[m+size-1:m] := a[i+7:i]
        		m := m + size
        	FI
        ENDFOR
        	

Store
-----
ZMM
~~~
_mm512_mask_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 mem_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm512_mask_storeu_epi16(void* mem_addr, __mmask32 k,
                                  __m512i a)

.. admonition:: Intel Description

    Store packed 16-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm512_mask_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 mem_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm512_mask_storeu_epi8(void* mem_addr, __mmask64 k,
                                 __m512i a)

.. admonition:: Intel Description

    Store packed 8-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm512_storeu_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI16 mem_addr, 
    UI16 a

.. code-block:: C

    void _mm512_storeu_epi16(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 32 packed 16-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_storeu_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI8 mem_addr, 
    UI8 a

.. code-block:: C

    void _mm512_storeu_epi8(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 64 packed 8-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_cvtsepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI16 a

.. code-block:: C

    void _mm512_mask_cvtsepi16_storeu_epi8(void* base_addr,
                                           __mmask32 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm512_mask_cvtusepi16_storeu_epi8(void* base_addr,
                                            __mmask32 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm512_mask_cvtepi16_storeu_epi8(void* base_addr,
                                          __mmask32 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm512_storeu_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm512_storeu_epi64(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_storeu_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm512_storeu_epi32(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_storeu_epi32(void* mem_addr, __mmask16 k,
                                  __m512i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_storeu_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    M512 mem_addr, 
    M512 a

.. code-block:: C

    void _mm512_storeu_si512(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits of integer data from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_storeu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_storeu_epi64(void* mem_addr, __mmask8 k,
                                  __m512i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_stream_si512
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    M512 mem_addr, 
    M512 a

.. code-block:: C

    void _mm512_stream_si512(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits of integer data from "a" into memory using a non-temporal memory hint. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_stream_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm512_stream_pd(void* mem_addr, __m512d a);

.. admonition:: Intel Description

    Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_stream_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm512_stream_ps(void* mem_addr, __m512 a);

.. admonition:: Intel Description

    Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_storeu_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm512_mask_storeu_pd(void* mem_addr, __mmask8 k,
                               __m512d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_storeu_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm512_storeu_pd(void* mem_addr, __m512d a);

.. admonition:: Intel Description

    Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_storeu_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm512_mask_storeu_ps(void* mem_addr, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_storeu_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm512_storeu_ps(void* mem_addr, __m512 a);

.. admonition:: Intel Description

    Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32scatter_epi64(void* base_addr,
                                 __m256i vindex, __m512i a,
                                 int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32scatter_epi64(void* base_addr,
                                      __mmask8 k,
                                      __m256i vindex, __m512i a,
                                      int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m256i a, 
    int scale
:Param ETypes:
    UI32 base_addr, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm512_i64scatter_epi32(void* base_addr,
                                 __m512i vindex, __m256i a,
                                 int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm512_mask_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m256i a, 
    int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i64scatter_epi32(void* base_addr,
                                      __mmask8 k,
                                      __m512i vindex, __m256i a,
                                      int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i64scatter_epi64(void* base_addr,
                                 __m512i vindex, __m512i a,
                                 int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i64scatter_epi64(void* base_addr,
                                      __mmask8 k,
                                      __m512i vindex, __m512i a,
                                      int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_i32scatter_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32scatter_pd(void* base_addr, __m256i vindex,
                              __m512d a, int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i32scatter_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32scatter_pd(void* base_addr, __mmask8 k,
                                   __m256i vindex, __m512d a,
                                   int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_i64scatter_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i64scatter_pd(void* base_addr, __m512i vindex,
                              __m512d a, int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i64scatter_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i64scatter_pd(void* base_addr, __mmask8 k,
                                   __m512i vindex, __m512d a,
                                   int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_i64scatter_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m256 a, 
    int scale
:Param ETypes:
    FP32 base_addr, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm512_i64scatter_ps(void* base_addr, __m512i vindex,
                              __m256 a, int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm512_mask_i64scatter_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m256 a, 
    int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i64scatter_ps(void* base_addr, __mmask8 k,
                                   __m512i vindex, __m256 a,
                                   int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_mullox_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mullox_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] * b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mullox_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_mullox_epi64(__m512i src, __mmask8 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_cvtepi32_storeu_epi8(void* base_addr,
                                          __mmask16 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_cvtepi32_storeu_epi16(void* base_addr,
                                           __mmask16 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtepi64_storeu_epi8(void* base_addr,
                                          __mmask8 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtepi64_storeu_epi32(void* base_addr,
                                           __mmask8 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtepi64_storeu_epi16(void* base_addr,
                                           __mmask8 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtsepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm512_mask_cvtsepi32_storeu_epi8(void* base_addr,
                                           __mmask16 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtsepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm512_mask_cvtsepi32_storeu_epi16(void* base_addr,
                                            __mmask16 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtsepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm512_mask_cvtsepi64_storeu_epi8(void* base_addr,
                                           __mmask8 k,
                                           __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtsepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI32 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm512_mask_cvtsepi64_storeu_epi32(void* base_addr,
                                            __mmask8 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtsepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm512_mask_cvtsepi64_storeu_epi16(void* base_addr,
                                            __mmask8 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_cvtusepi32_storeu_epi8(void* base_addr,
                                            __mmask16 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_cvtusepi32_storeu_epi16(void* base_addr,
                                             __mmask16 k,
                                             __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtusepi64_storeu_epi8(void* base_addr,
                                            __mmask8 k,
                                            __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtusepi64_storeu_epi32(void* base_addr,
                                             __mmask8 k,
                                             __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_cvtusepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_cvtusepi64_storeu_epi16(void* base_addr,
                                             __mmask8 k,
                                             __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm512_mask_store_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm512_mask_store_pd(void* mem_addr, __mmask8 k,
                              __m512d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_store_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm512_store_pd(void* mem_addr, __m512d a);

.. admonition:: Intel Description

    Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory.
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_store_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm512_mask_store_ps(void* mem_addr, __mmask16 k,
                              __m512 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_store_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm512_store_ps(void* mem_addr, __m512 a);

.. admonition:: Intel Description

    Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_store_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm512_mask_store_epi32(void* mem_addr, __mmask16 k,
                                 __m512i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_store_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm512_store_epi32(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_store_si512
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    M512 mem_addr, 
    M512 a

.. code-block:: C

    void _mm512_store_si512(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits of integer data from "a" into memory. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_mask_store_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm512_mask_store_epi64(void* mem_addr, __mmask8 k,
                                 __m512i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_store_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m512i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm512_store_epi64(void* mem_addr, __m512i a);

.. admonition:: Intel Description

    Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI32 base_addr, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32scatter_epi32(void* base_addr,
                                 __m512i vindex, __m512i a,
                                 int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm512_mask_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32scatter_epi32(void* base_addr,
                                      __mmask16 k,
                                      __m512i vindex, __m512i a,
                                      int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_i32scatter_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512 a, 
    int scale
:Param ETypes:
    FP32 base_addr, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32scatter_ps(void* base_addr, __m512i vindex,
                              __m512 a, int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm512_mask_i32scatter_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m512i vindex, 
    __m512 a, 
    int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32scatter_ps(void* base_addr, __mmask16 k,
                                   __m512i vindex, __m512 a,
                                   int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm512_i32loscatter_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32loscatter_pd(void* base_addr, __m512i vindex,
                                __m512d a, int scale)

.. admonition:: Intel Description

    Stores 8 packed double-precision (64-bit) floating-point elements in "a" and to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i32loscatter_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m512d a, 
    int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32loscatter_pd(void* base_addr,
                                     __mmask8 k, __m512i vindex,
                                     __m512d a, int scale)

.. admonition:: Intel Description

    Stores 8 packed double-precision (64-bit) floating-point elements in "a" to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_i32loscatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_i32loscatter_epi64(void* base_addr,
                                   __m512i vindex, __m512i a,
                                   int scale)

.. admonition:: Intel Description

    Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm512_mask_i32loscatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m512i vindex, 
    __m512i a, 
    int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm512_mask_i32loscatter_epi64(void* base_addr,
                                        __mmask8 k,
                                        __m512i vindex,
                                        __m512i a, int scale)

.. admonition:: Intel Description

    Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements whose corresponding mask bit is not set are not written to memory).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm512_store_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m512h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm512_store_ph(void * mem_addr, __m512h a);

.. admonition:: Intel Description

    Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

_mm512_storeu_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m512h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm512_storeu_ph(void * mem_addr, __m512h a);

.. admonition:: Intel Description

    Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+511:mem_addr] := a[511:0]
        	

YMM
~~~
_mm256_mask_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 mem_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm256_mask_storeu_epi16(void* mem_addr, __mmask16 k,
                                  __m256i a)

.. admonition:: Intel Description

    Store packed 16-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm256_mask_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 mem_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm256_mask_storeu_epi8(void* mem_addr, __mmask32 k,
                                 __m256i a)

.. admonition:: Intel Description

    Store packed 8-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm256_storeu_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI16 mem_addr, 
    UI16 a

.. code-block:: C

    void _mm256_storeu_epi16(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 16 packed 16-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI8 mem_addr, 
    UI8 a

.. code-block:: C

    void _mm256_storeu_epi8(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 32 packed 8-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_mask_cvtsepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI16 a

.. code-block:: C

    void _mm256_mask_cvtsepi16_storeu_epi8(void* base_addr,
                                           __mmask16 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm256_mask_cvtusepi16_storeu_epi8(void* base_addr,
                                            __mmask16 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm256_mask_cvtepi16_storeu_epi8(void* base_addr,
                                          __mmask16 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm256_mask_compressstoreu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm256_mask_compressstoreu_pd(void* base_addr,
                                       __mmask8 k, __m256d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm256_mask_compressstoreu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm256_mask_compressstoreu_ps(void* base_addr,
                                       __mmask8 k, __m256 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm256_mask_store_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm256_mask_store_pd(void* mem_addr, __mmask8 k,
                              __m256d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_mask_store_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm256_mask_store_ps(void* mem_addr, __mmask8 k,
                              __m256 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_mask_store_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_store_epi32(void* mem_addr, __mmask8 k,
                                 __m256i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_mask_store_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_store_epi64(void* mem_addr, __mmask8 k,
                                 __m256i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_mask_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_storeu_epi32(void* mem_addr, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_mask_storeu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_storeu_epi64(void* mem_addr, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_mask_storeu_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm256_mask_storeu_pd(void* mem_addr, __mmask8 k,
                               __m256d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_mask_storeu_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm256_mask_storeu_ps(void* mem_addr, __mmask8 k,
                               __m256 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_mask_compressstoreu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_compressstoreu_epi32(void* base_addr,
                                          __mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm256_mask_compressstoreu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_compressstoreu_epi64(void* base_addr,
                                          __mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm256_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm256_i32scatter_epi32(void* base_addr,
                                 __m256i vindex, __m256i a,
                                 const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm256_mask_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i32scatter_epi32(void* base_addr,
                                      __mmask8 k,
                                      __m256i vindex, __m256i a,
                                      const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm256_i32scatter_epi64(void* base_addr,
                                 __m128i vindex, __m256i a,
                                 const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm256_mask_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i32scatter_epi64(void* base_addr,
                                      __mmask8 k,
                                      __m128i vindex, __m256i a,
                                      const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm256_i64scatter_epi32(void* base_addr,
                                 __m256i vindex, __m128i a,
                                 const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm256_mask_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i64scatter_epi32(void* base_addr,
                                      __mmask8 k,
                                      __m256i vindex, __m128i a,
                                      const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm256_i64scatter_epi64(void* base_addr,
                                 __m256i vindex, __m256i a,
                                 const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm256_mask_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m256i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i64scatter_epi64(void* base_addr,
                                      __mmask8 k,
                                      __m256i vindex, __m256i a,
                                      const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_i32scatter_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m256d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm256_i32scatter_pd(void* base_addr, __m128i vindex,
                              __m256d a, const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm256_mask_i32scatter_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m256d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i32scatter_pd(void* base_addr, __mmask8 k,
                                   __m128i vindex, __m256d a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_i32scatter_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m256 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm256_i32scatter_ps(void* base_addr, __m256i vindex,
                              __m256 a, const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm256_mask_i32scatter_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m256 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i32scatter_ps(void* base_addr, __mmask8 k,
                                   __m256i vindex, __m256 a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_i64scatter_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m256d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm256_i64scatter_pd(void* base_addr, __m256i vindex,
                              __m256d a, const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm256_mask_i64scatter_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m256d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i64scatter_pd(void* base_addr, __mmask8 k,
                                   __m256i vindex, __m256d a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_i64scatter_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m256i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm256_i64scatter_ps(void* base_addr, __m256i vindex,
                              __m128 a, const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm256_mask_i64scatter_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm256_mask_i64scatter_ps(void* base_addr, __mmask8 k,
                                   __m256i vindex, __m128 a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_storeu_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm256_storeu_epi64(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm256_storeu_epi32(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_store_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm256_store_epi64(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory.
    		"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_store_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm256_store_epi32(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory.
    		"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_mask_cvtepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_cvtepi32_storeu_epi8(void* base_addr,
                                          __mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_cvtepi32_storeu_epi16(void* base_addr,
                                           __mmask8 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtepi64_storeu_epi8(void* base_addr,
                                          __mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtepi64_storeu_epi32(void* base_addr,
                                           __mmask8 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtepi64_storeu_epi16(void* base_addr,
                                           __mmask8 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtsepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm256_mask_cvtsepi32_storeu_epi8(void* base_addr,
                                           __mmask8 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtsepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm256_mask_cvtsepi32_storeu_epi16(void* base_addr,
                                            __mmask8 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtsepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm256_mask_cvtsepi64_storeu_epi8(void* base_addr,
                                           __mmask8 k,
                                           __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtsepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI32 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm256_mask_cvtsepi64_storeu_epi32(void* base_addr,
                                            __mmask8 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtsepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm256_mask_cvtsepi64_storeu_epi16(void* base_addr,
                                            __mmask8 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_cvtusepi32_storeu_epi8(void* base_addr,
                                            __mmask8 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm256_mask_cvtusepi32_storeu_epi16(void* base_addr,
                                             __mmask8 k,
                                             __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtusepi64_storeu_epi8(void* base_addr,
                                            __mmask8 k,
                                            __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtusepi64_storeu_epi32(void* base_addr,
                                             __mmask8 k,
                                             __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_mask_cvtusepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm256_mask_cvtusepi64_storeu_epi16(void* base_addr,
                                             __mmask8 k,
                                             __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm256_store_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m256h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm256_store_ph(void * mem_addr, __m256h a);

.. admonition:: Intel Description

    Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m256h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm256_storeu_ph(void * mem_addr, __m256h a);

.. admonition:: Intel Description

    Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

XMM
~~~
_mm_mask_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 mem_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm_mask_storeu_epi16(void* mem_addr, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Store packed 16-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
        	FI
        ENDFOR
        	

_mm_mask_storeu_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 mem_addr, 
    MASK k, 
    UI8 a

.. code-block:: C

    void _mm_mask_storeu_epi8(void* mem_addr, __mmask16 k,
                              __m128i a)

.. admonition:: Intel Description

    Store packed 8-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
        	FI
        ENDFOR
        	

_mm_storeu_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI16 mem_addr, 
    UI16 a

.. code-block:: C

    void _mm_storeu_epi16(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 8 packed 16-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI8 mem_addr, 
    UI8 a

.. code-block:: C

    void _mm_storeu_epi8(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 16 packed 8-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_mask_cvtsepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI16 a

.. code-block:: C

    void _mm_mask_cvtsepi16_storeu_epi8(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm_mask_cvtusepi16_storeu_epi8(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtepi16_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI16 a

.. code-block:: C

    void _mm_mask_cvtepi16_storeu_epi8(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
        	FI
        ENDFOR
        	

_mm_mask_compressstoreu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm_mask_compressstoreu_pd(void* base_addr, __mmask8 k,
                                    __m128d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm_mask_compressstoreu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm_mask_compressstoreu_ps(void* base_addr, __mmask8 k,
                                    __m128 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm_mask_store_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm_mask_store_pd(void* mem_addr, __mmask8 k,
                           __m128d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_mask_store_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm_mask_store_ps(void* mem_addr, __mmask8 k,
                           __m128 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_mask_store_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_store_epi32(void* mem_addr, __mmask8 k,
                              __m128i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_mask_store_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_store_epi64(void* mem_addr, __mmask8 k,
                              __m128i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_mask_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_storeu_epi32(void* mem_addr, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_mask_storeu_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_storeu_epi64(void* mem_addr, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_mask_storeu_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm_mask_storeu_pd(void* mem_addr, __mmask8 k,
                            __m128d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_mask_storeu_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm_mask_storeu_ps(void* mem_addr, __mmask8 k,
                            __m128 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_mask_compressstoreu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_compressstoreu_epi32(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := base_addr
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		MEM[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm_mask_compressstoreu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_compressstoreu_epi64(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := base_addr
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		MEM[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        	

_mm_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm_i32scatter_epi32(void* base_addr, __m128i vindex,
                              __m128i a, const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm_mask_i32scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i32scatter_epi32(void* base_addr, __mmask8 k,
                                   __m128i vindex, __m128i a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm_i32scatter_epi64(void* base_addr, __m128i vindex,
                              __m128i a, const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm_mask_i32scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI32 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i32scatter_epi64(void* base_addr, __mmask8 k,
                                   __m128i vindex, __m128i a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm_i64scatter_epi32(void* base_addr, __m128i vindex,
                              __m128i a, const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm_mask_i64scatter_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI32 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i64scatter_epi32(void* base_addr, __mmask8 k,
                                   __m128i vindex, __m128i a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm_i64scatter_epi64(void* base_addr, __m128i vindex,
                              __m128i a, const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm_mask_i64scatter_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128i a, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    MASK k, 
    SI64 vindex, 
    UI64 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i64scatter_epi64(void* base_addr, __mmask8 k,
                                   __m128i vindex, __m128i a,
                                   const int scale)

.. admonition:: Intel Description

    Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_i32scatter_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm_i32scatter_pd(void* base_addr, __m128i vindex,
                           __m128d a, const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm_mask_i32scatter_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i32scatter_pd(void* base_addr, __mmask8 k,
                                __m128i vindex, __m128d a,
                                const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_i32scatter_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm_i32scatter_ps(void* base_addr, __m128i vindex,
                           __m128 a, const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm_mask_i32scatter_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI32 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i32scatter_ps(void* base_addr, __mmask8 k,
                                __m128i vindex, __m128 a,
                                const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_i64scatter_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm_i64scatter_pd(void* base_addr, __m128i vindex,
                           __m128d a, const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+63:addr] := a[i+63:i]
        ENDFOR
        	

_mm_mask_i64scatter_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128d a, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP64 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i64scatter_pd(void* base_addr, __mmask8 k,
                                __m128i vindex, __m128d a,
                                const int scale)

.. admonition:: Intel Description

    Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+63:addr] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_i64scatter_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __m128i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm_i64scatter_ps(void* base_addr, __m128i vindex,
                           __m128 a, const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	MEM[addr+31:addr] := a[i+31:i]
        ENDFOR
        	

_mm_mask_i64scatter_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i vindex, 
    __m128 a, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    MASK k, 
    SI64 vindex, 
    FP32 a, 
    IMM scale

.. code-block:: C

    void _mm_mask_i64scatter_ps(void* base_addr, __mmask8 k,
                                __m128i vindex, __m128 a,
                                const int scale)

.. admonition:: Intel Description

    Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		MEM[addr+31:addr] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_storeu_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_storeu_epi64(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm_storeu_epi32(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory.
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_store_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    UI64 a

.. code-block:: C

    void _mm_store_epi64(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory.
    		"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_store_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    UI32 a

.. code-block:: C

    void _mm_store_epi32(void* mem_addr, __m128i a);

.. admonition:: Intel Description

    Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory.
    		"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_mask_cvtepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_cvtepi32_storeu_epi8(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_cvtepi32_storeu_epi16(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtepi64_storeu_epi8(void* base_addr,
                                       __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtepi64_storeu_epi32(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtepi64_storeu_epi16(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtsepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm_mask_cvtsepi32_storeu_epi8(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtsepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI32 a

.. code-block:: C

    void _mm_mask_cvtsepi32_storeu_epi16(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtsepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI8 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm_mask_cvtsepi64_storeu_epi8(void* base_addr,
                                        __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtsepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm_mask_cvtsepi64_storeu_epi32(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtsepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI16 base_addr, 
    MASK k, 
    SI64 a

.. code-block:: C

    void _mm_mask_cvtsepi64_storeu_epi16(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi32_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_cvtusepi32_storeu_epi8(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi32_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI32 a

.. code-block:: C

    void _mm_mask_cvtusepi32_storeu_epi16(void* base_addr,
                                          __mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi64_storeu_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtusepi64_storeu_epi8(void* base_addr,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi64_storeu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtusepi64_storeu_epi32(void* base_addr,
                                          __mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_cvtusepi64_storeu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void* base_addr, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 base_addr, 
    MASK k, 
    UI64 a

.. code-block:: C

    void _mm_mask_cvtusepi64_storeu_epi16(void* base_addr,
                                          __mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
        	FI
        ENDFOR
        	

_mm_mask_store_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double* mem_addr, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    MASK k, 
    FP64 a

.. code-block:: C

    void _mm_mask_store_sd(double* mem_addr, __mmask8 k,
                           __m128d a)

.. admonition:: Intel Description

    Store the lower double-precision (64-bit) floating-point element from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	MEM[mem_addr+63:mem_addr] := a[63:0]
        FI
        	

_mm_mask_store_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float* mem_addr, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    MASK k, 
    FP32 a

.. code-block:: C

    void _mm_mask_store_ss(float* mem_addr, __mmask8 k,
                           __m128 a)

.. admonition:: Intel Description

    Store the lower single-precision (32-bit) floating-point element from "a" into memory using writemask "k".
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	MEM[mem_addr+31:mem_addr] := a[31:0]
        FI
        	

_mm_store_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m128h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm_store_ph(void * mem_addr, __m128h a);

.. admonition:: Intel Description

    Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_storeu_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m128h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm_storeu_ph(void * mem_addr, __m128h a);

.. admonition:: Intel Description

    Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from "a" into memory. 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+127:mem_addr] := a[127:0]
        	

_mm_store_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __m128h a
:Param ETypes:
    FP16 mem_addr, 
    FP16 a

.. code-block:: C

    void _mm_store_sh(void * mem_addr, __m128h a);

.. admonition:: Intel Description

    Store the lower half-precision (16-bit) floating-point element from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr].fp16[0] := a.fp16[0]
        	

_mm_mask_store_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    void * mem_addr, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 mem_addr, 
    MASK k, 
    FP16 a

.. code-block:: C

    void _mm_mask_store_sh(void* mem_addr, __mmask8 k,
                           __m128h a)

.. admonition:: Intel Description

    Store the lower half-precision (16-bit) floating-point element from "a" into memory using writemask "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	MEM[mem_addr].fp16[0] := a.fp16[0]
        FI
        	

Other
~~~~~
_store_mask32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-Other
:Return Type: void
:Param Types:
    __mmask32* mem_addr, 
    __mmask32 a
:Param ETypes:
    MASK mem_addr, 
    MASK a

.. code-block:: C

    void _store_mask32(__mmask32* mem_addr, __mmask32 a);

.. admonition:: Intel Description

    Store 32-bit mask from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+31:mem_addr] := a[31:0]
        	

_store_mask64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-Other
:Return Type: void
:Param Types:
    __mmask64* mem_addr, 
    __mmask64 a
:Param ETypes:
    MASK mem_addr, 
    MASK a

.. code-block:: C

    void _store_mask64(__mmask64* mem_addr, __mmask64 a);

.. admonition:: Intel Description

    Store 64-bit mask from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+63:mem_addr] := a[63:0]
        	

_store_mask8
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-Other
:Return Type: void
:Param Types:
    __mmask8* mem_addr, 
    __mmask8 a
:Param ETypes:
    MASK mem_addr, 
    MASK a

.. code-block:: C

    void _store_mask8(__mmask8* mem_addr, __mmask8 a);

.. admonition:: Intel Description

    Store 8-bit mask from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+7:mem_addr] := a[7:0]
        	

_store_mask16
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Store
:Header: immintrin.h
:Searchable: AVX-512-Store-Other
:Return Type: void
:Param Types:
    __mmask16* mem_addr, 
    __mmask16 a
:Param ETypes:
    MASK mem_addr, 
    MASK a

.. code-block:: C

    void _store_mask16(__mmask16* mem_addr, __mmask16 a);

.. admonition:: Intel Description

    Store 16-bit mask from "a" into memory.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+15:mem_addr] := a[15:0]
        	

Load
----
ZMM
~~~
_mm512_mask_loadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    void const* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m512i _mm512_mask_loadu_epi16(__m512i src, __mmask32 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_loadu_epi16(__mmask32 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_loadu_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    void const* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m512i _mm512_mask_loadu_epi8(__m512i src, __mmask64 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_loadu_epi8(__mmask64 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_loadu_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI16 mem_addr

.. code-block:: C

    __m512i _mm512_loadu_epi16(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 32 packed 16-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_loadu_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI8 mem_addr

.. code-block:: C

    __m512i _mm512_loadu_epi8(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 64 packed 8-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_loadu_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_loadu_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_loadu_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_loadu_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_i32gather_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI32 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_i32gather_pd(__m256i vindex,
                                void const* base_addr,
                                int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_mask_i32gather_pd(__m512d src, __mmask8 k,
                                     __m256i vindex,
                                     void const* base_addr,
                                     int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i64gather_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_i64gather_pd(__m512i vindex,
                                void const* base_addr,
                                int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i64gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_mask_i64gather_pd(__m512d src, __mmask8 k,
                                     __m512i vindex,
                                     void const* base_addr,
                                     int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i64gather_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m256 _mm512_i64gather_ps(__m512i vindex,
                               void const* base_addr,
                               int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_i64gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m256 _mm512_mask_i64gather_ps(__m256 src, __mmask8 k,
                                    __m512i vindex,
                                    void const* base_addr,
                                    int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_load_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_maskz_load_pd(__mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_load_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_maskz_load_ps(__mmask16 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_load_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_load_epi32(__mmask16 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_load_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_load_epi64(__mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_loadu_si512
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_loadu_si512(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits of integer data from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_loadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_mask_loadu_epi32(__m512i src, __mmask16 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_loadu_epi32(__mmask16 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_loadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_mask_loadu_epi64(__m512i src, __mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_maskz_loadu_epi64(__mmask8 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_stream_load_si512
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    M512 mem_addr

.. code-block:: C

    __m512i _mm512_stream_load_si512(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits of integer data from memory into "dst" using a non-temporal memory hint. 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_loadu_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_loadu_pd(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_loadu_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_mask_loadu_pd(__m512d src, __mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_maskz_loadu_pd(__mmask8 k,
                                  void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_loadu_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_loadu_ps(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_loadu_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_mask_loadu_ps(__m512 src, __mmask16 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_loadu_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_maskz_loadu_ps(__mmask16 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_i32gather_epi64(__m256i vindex,
                                   void const* base_addr,
                                   int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI32 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_mask_i32gather_epi64(__m512i src, __mmask8 k,
                                        __m256i vindex,
                                        void const* base_addr,
                                        int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m256i _mm512_i64gather_epi32(__m512i vindex,
                                   void const* base_addr,
                                   int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI64 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m256i _mm512_mask_i64gather_epi32(__m256i src, __mmask8 k,
                                        __m512i vindex,
                                        void const* base_addr,
                                        int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_i64gather_epi64(__m512i vindex,
                                   void const* base_addr,
                                   int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_mask_i64gather_epi64(__m512i src, __mmask8 k,
                                        __m512i vindex,
                                        void const* base_addr,
                                        int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i32gather_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI32 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m512 _mm512_i32gather_ps(__m512i vindex,
                               void const* base_addr,
                               int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m512 _mm512_mask_i32gather_ps(__m512 src, __mmask16 k,
                                    __m512i vindex,
                                    void const* base_addr,
                                    int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_load_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_load_pd(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_load_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m512d _mm512_mask_load_pd(__m512d src, __mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_load_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_load_ps(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_load_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m512 _mm512_mask_load_ps(__m512 src, __mmask16 k,
                               void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_load_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_load_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_load_si512
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    M512 mem_addr

.. code-block:: C

    __m512i _mm512_load_si512(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits of integer data from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_load_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m512i _mm512_mask_load_epi32(__m512i src, __mmask16 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_load_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_load_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_mask_load_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m512i _mm512_mask_load_epi64(__m512i src, __mmask8 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI32 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_i32gather_epi32(__m512i vindex,
                                   void const* base_addr,
                                   int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_mask_i32gather_epi32(__m512i src,
                                        __mmask16 k,
                                        __m512i vindex,
                                        void const* base_addr,
                                        int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i32logather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI32 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_i32logather_epi64(__m512i vindex,
                                     void const* base_addr,
                                     int scale)

.. admonition:: Intel Description

    Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32logather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m512i _mm512_mask_i32logather_epi64(__m512i src,
                                          __mmask8 k,
                                          __m512i vindex,
                                          void const* base_addr,
                                          int scale)

.. admonition:: Intel Description

    Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_i32logather_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_i32logather_pd(__m512i vindex,
                                  void const* base_addr,
                                  int scale)

.. admonition:: Intel Description

    Loads 8 double-precision (64-bit) floating-point elements stored at memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" them in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_i32logather_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i vindex, 
    void const* base_addr, 
    int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m512d _mm512_mask_i32logather_pd(__m512d src, __mmask8 k,
                                       __m512i vindex,
                                       void const* base_addr,
                                       int scale)

.. admonition:: Intel Description

    Loads 8 double-precision (64-bit) floating-point elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_load_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m512h _mm512_load_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

_mm512_loadu_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m512h _mm512_loadu_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := MEM[mem_addr+511:mem_addr]
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_loadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m256i _mm256_mask_loadu_epi16(__m256i src, __mmask16 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_loadu_epi16(__mmask16 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_loadu_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    void const* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m256i _mm256_mask_loadu_epi8(__m256i src, __mmask32 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_loadu_epi8(__mmask32 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_loadu_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI16 mem_addr

.. code-block:: C

    __m256i _mm256_loadu_epi16(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 16 packed 16-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI8 mem_addr

.. code-block:: C

    __m256i _mm256_loadu_epi8(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 32 packed 8-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_mask_expandloadu_pd(__m256d src, __mmask8 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_maskz_expandloadu_pd(__mmask8 k,
                                        void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_mask_expandloadu_ps(__m256 src, __mmask8 k,
                                      void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_maskz_expandloadu_ps(__mmask8 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i32gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m256d _mm256_mmask_i32gather_pd(__m256d src, __mmask8 k,
                                      __m128i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i32gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m256 _mm256_mmask_i32gather_ps(__m256 src, __mmask8 k,
                                     __m256i vindex,
                                     void const* base_addr,
                                     const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i64gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m256d _mm256_mmask_i64gather_pd(__m256d src, __mmask8 k,
                                      __m256i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i64gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m128 _mm256_mmask_i64gather_ps(__m128 src, __mmask8 k,
                                     __m256i vindex,
                                     void const* base_addr,
                                     const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_load_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_mask_load_pd(__m256d src, __mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_load_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_maskz_load_pd(__mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_load_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_mask_load_ps(__m256 src, __mmask8 k,
                               void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_load_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_maskz_load_ps(__mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_load_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_mask_load_epi32(__m256i src, __mmask8 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_load_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_load_epi32(__mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_load_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_mask_load_epi64(__m256i src, __mmask8 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_load_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_load_epi64(__mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_loadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_mask_loadu_epi32(__m256i src, __mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_loadu_epi32(__mmask8 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_loadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_mask_loadu_epi64(__m256i src, __mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_loadu_epi64(__mmask8 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_loadu_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_mask_loadu_pd(__m256d src, __mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_maskz_loadu_pd(__mmask8 k,
                                  void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_loadu_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_mask_loadu_ps(__m256 src, __mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_loadu_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_maskz_loadu_ps(__mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_mask_expandloadu_epi32(__m256i src,
                                          __mmask8 k,
                                          void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_expandloadu_epi32(
        __mmask8 k, void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_mask_expandloadu_epi64(__m256i src,
                                          __mmask8 k,
                                          void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_maskz_expandloadu_epi64(
        __mmask8 k, void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mmask_i32gather_epi32(__m256i src,
                                         __mmask8 k,
                                         __m256i vindex,
                                         void const* base_addr,
                                         const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI32 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mmask_i32gather_epi64(__m256i src,
                                         __mmask8 k,
                                         __m128i vindex,
                                         void const* base_addr,
                                         const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mmask_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI64 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m128i _mm256_mmask_i64gather_epi32(__m128i src,
                                         __mmask8 k,
                                         __m256i vindex,
                                         void const* base_addr,
                                         const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mmask_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mmask_i64gather_epi64(__m256i src,
                                         __mmask8 k,
                                         __m256i vindex,
                                         void const* base_addr,
                                         const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_loadu_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_loadu_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_loadu_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_load_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m256i _mm256_load_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst".
    		"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_load_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m256i _mm256_load_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst".
    		"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_load_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m256h _mm256_load_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m256h _mm256_loadu_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_loadu_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_mask_loadu_epi16(__m128i src, __mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_maskz_loadu_epi16(__mmask8 k,
                                  void const* mem_addr)

.. admonition:: Intel Description

    Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_loadu_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m128i _mm_mask_loadu_epi8(__m128i src, __mmask16 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI8 mem_addr

.. code-block:: C

    __m128i _mm_maskz_loadu_epi8(__mmask16 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_loadu_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI16 mem_addr

.. code-block:: C

    __m128i _mm_loadu_epi16(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 8 packed 16-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_loadu_epi8
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI8 mem_addr

.. code-block:: C

    __m128i _mm_loadu_epi8(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 16 packed 8-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_mask_expandloadu_pd(__m128d src, __mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_maskz_expandloadu_pd(__mmask8 k,
                                     void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_mask_expandloadu_ps(__m128 src, __mmask8 k,
                                   void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_maskz_expandloadu_ps(__mmask8 k,
                                    void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i32gather_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m128d _mm_mmask_i32gather_pd(__m128d src, __mmask8 k,
                                   __m128i vindex,
                                   void const* base_addr,
                                   const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i32gather_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m128 _mm_mmask_i32gather_ps(__m128 src, __mmask8 k,
                                  __m128i vindex,
                                  void const* base_addr,
                                  const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i64gather_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 vindex, 
    FP64 base_addr, 
    IMM scale

.. code-block:: C

    __m128d _mm_mmask_i64gather_pd(__m128d src, __mmask8 k,
                                   __m128i vindex,
                                   void const* base_addr,
                                   const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i64gather_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 vindex, 
    FP32 base_addr, 
    IMM scale

.. code-block:: C

    __m128 _mm_mmask_i64gather_ps(__m128 src, __mmask8 k,
                                  __m128i vindex,
                                  void const* base_addr,
                                  const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_load_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_mask_load_pd(__m128d src, __mmask8 k,
                             void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_load_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_maskz_load_pd(__mmask8 k, void const* mem_addr);

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_load_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_mask_load_ps(__m128 src, __mmask8 k,
                            void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_load_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_maskz_load_ps(__mmask8 k, void const* mem_addr);

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_load_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_mask_load_epi32(__m128i src, __mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_load_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_maskz_load_epi32(__mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_load_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_mask_load_epi64(__m128i src, __mmask8 k,
                                void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_load_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_maskz_load_epi64(__mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_loadu_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_mask_loadu_epi32(__m128i src, __mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_maskz_loadu_epi32(__mmask8 k,
                                  void const* mem_addr)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_loadu_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_mask_loadu_epi64(__m128i src, __mmask8 k,
                                 void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_maskz_loadu_epi64(__mmask8 k,
                                  void const* mem_addr)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_loadu_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_mask_loadu_pd(__m128d src, __mmask8 k,
                              void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_maskz_loadu_pd(__mmask8 k,
                               void const* mem_addr)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_loadu_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_mask_loadu_ps(__m128 src, __mmask8 k,
                             void const* mem_addr)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_loadu_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_maskz_loadu_ps(__mmask8 k, void const* mem_addr);

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_mask_expandloadu_epi32(__m128i src, __mmask8 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_maskz_expandloadu_epi32(__mmask8 k,
                                        void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_mask_expandloadu_epi64(__m128i src, __mmask8 k,
                                       void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expandloadu_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_maskz_expandloadu_epi64(__mmask8 k,
                                        void const* mem_addr)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m128i _mm_mmask_i32gather_epi32(__m128i src, __mmask8 k,
                                      __m128i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI32 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m128i _mm_mmask_i32gather_epi64(__m128i src, __mmask8 k,
                                      __m128i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF k[j]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mmask_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI64 vindex, 
    UI32 base_addr, 
    IMM scale

.. code-block:: C

    __m128i _mm_mmask_i64gather_epi32(__m128i src, __mmask8 k,
                                      __m128i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mmask_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i vindex, 
    void const* base_addr, 
    const int scale
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 vindex, 
    UI64 base_addr, 
    IMM scale

.. code-block:: C

    __m128i _mm_mmask_i64gather_epi64(__m128i src, __mmask8 k,
                                      __m128i vindex,
                                      void const* base_addr,
                                      const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF k[j]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_loadu_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_loadu_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_loadu_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_loadu_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst".
    		"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_load_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI64 mem_addr

.. code-block:: C

    __m128i _mm_load_epi64(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst".
    		"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_load_epi32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    void const* mem_addr
:Param ETypes:
    UI32 mem_addr

.. code-block:: C

    __m128i _mm_load_epi32(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst".
    		"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_mask_load_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    const double* mem_addr
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_mask_load_sd(__m128d src, __mmask8 k,
                             const double* mem_addr)

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MEM[mem_addr+63:mem_addr]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[MAX:64] := 0
        	

_mm_maskz_load_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    const double* mem_addr
:Param ETypes:
    MASK k, 
    FP64 mem_addr

.. code-block:: C

    __m128d _mm_maskz_load_sd(__mmask8 k,
                              const double* mem_addr)

.. admonition:: Intel Description

    Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := MEM[mem_addr+63:mem_addr]
        ELSE
        	dst[63:0] := 0
        FI
        dst[MAX:64] := 0
        	

_mm_mask_load_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    const float* mem_addr
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_mask_load_ss(__m128 src, __mmask8 k,
                            const float* mem_addr)

.. admonition:: Intel Description

    Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MEM[mem_addr+31:mem_addr]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[MAX:32] := 0
        	

_mm_maskz_load_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    const float* mem_addr
:Param ETypes:
    MASK k, 
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_maskz_load_ss(__mmask8 k, const float* mem_addr);

.. admonition:: Intel Description

    Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := MEM[mem_addr+31:mem_addr]
        ELSE
        	dst[31:0] := 0
        FI
        dst[MAX:32] := 0
        	

_mm_load_ph
^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m128h _mm_load_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_loadu_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m128h _mm_loadu_ph(void const* mem_addr);

.. admonition:: Intel Description

    Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into "dst". 
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[mem_addr+127:mem_addr]
        dst[MAX:128] := 0
        	

_mm_load_sh
^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    void const* mem_addr
:Param ETypes:
    FP16 mem_addr

.. code-block:: C

    __m128h _mm_load_sh(void const* mem_addr);

.. admonition:: Intel Description

    Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst", and zero the upper elements.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := MEM[mem_addr].fp16[0]
        dst[MAX:16] := 0
        	

_mm_mask_load_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 mem_addr

.. code-block:: C

    __m128h _mm_mask_load_sh(__m128h src, __mmask8 k,
                             void const* mem_addr)

.. admonition:: Intel Description

    Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper elements of "dst" to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := MEM[mem_addr].fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[MAX:16] := 0
        	

_mm_maskz_load_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    void const* mem_addr
:Param ETypes:
    MASK k, 
    FP16 mem_addr

.. code-block:: C

    __m128h _mm_maskz_load_sh(__mmask8 k, void const* mem_addr);

.. admonition:: Intel Description

    Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper elements of "dst" to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := MEM[mem_addr].fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[MAX:16] := 0
        	

Other
~~~~~
_load_mask32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-Other
:Return Type: __mmask32
:Param Types:
    __mmask32* mem_addr
:Param ETypes:
    MASK mem_addr

.. code-block:: C

    __mmask32 _load_mask32(__mmask32* mem_addr);

.. admonition:: Intel Description

    Load 32-bit mask from memory into "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := MEM[mem_addr+31:mem_addr]
        	

_load_mask64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-Other
:Return Type: __mmask64
:Param Types:
    __mmask64* mem_addr
:Param ETypes:
    MASK mem_addr

.. code-block:: C

    __mmask64 _load_mask64(__mmask64* mem_addr);

.. admonition:: Intel Description

    Load 64-bit mask from memory into "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := MEM[mem_addr+63:mem_addr]
        	

_load_mask8
^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-Other
:Return Type: __mmask8
:Param Types:
    __mmask8* mem_addr
:Param ETypes:
    MASK mem_addr

.. code-block:: C

    __mmask8 _load_mask8(__mmask8* mem_addr);

.. admonition:: Intel Description

    Load 8-bit mask from memory into "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := MEM[mem_addr+7:mem_addr]
        	

_load_mask16
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Load
:Header: immintrin.h
:Searchable: AVX-512-Load-Other
:Return Type: __mmask16
:Param Types:
    __mmask16* mem_addr
:Param ETypes:
    MASK mem_addr

.. code-block:: C

    __mmask16 _load_mask16(__mmask16* mem_addr);

.. admonition:: Intel Description

    Load 16-bit mask from memory into "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := MEM[mem_addr+15:mem_addr]
        	

Elementary Math Functions
-------------------------
ZMM
~~~
_mm512_mask_rcp14_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_rcp14_pd(__m512d src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rcp14_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_rcp14_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rcp14_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_rcp14_pd(__m512d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (1.0 / a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rcp14_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_rcp14_ps(__m512 src, __mmask16 k,
                                __m512 a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rcp14_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_rcp14_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rcp14_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_rcp14_ps(__m512 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (1.0 / a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_rsqrt14_pd(__m512d src, __mmask8 k,
                                   __m512d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_rsqrt14_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rsqrt14_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_rsqrt14_pd(__m512d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_rsqrt14_ps(__m512 src, __mmask16 k,
                                  __m512 a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_rsqrt14_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rsqrt14_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_rsqrt14_ps(__m512 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_sqrt_pd(__m512d src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_sqrt_round_pd(__m512d src, __mmask8 k,
                                      __m512d a, int rounding)

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_sqrt_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_sqrt_round_pd(__mmask8 k, __m512d a,
                                       int rounding)

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note].

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_sqrt_pd(__m512d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_round_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_sqrt_round_pd(__m512d a, int rounding);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note].

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_sqrt_ps(__m512 src, __mmask16 k,
                               __m512 a)

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_sqrt_round_ps(__m512 src, __mmask16 k,
                                     __m512 a, int rounding)

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_sqrt_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_sqrt_round_ps(__mmask16 k, __m512 a,
                                      int rounding)

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_sqrt_ps(__m512 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_round_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_sqrt_round_ps(__m512 a, int rounding);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note].

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rsqrt_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_rsqrt_ph(__m512h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rsqrt_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_rsqrt_ph(__m512h src, __mmask32 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rsqrt_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_maskz_rsqrt_ph(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_sqrt_ph(__m512h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	dst.fp16[i] := SQRT(a.fp16[i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sqrt_round_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    const int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_sqrt_round_ph(__m512h a, const int rounding);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	dst.fp16[i] := SQRT(a.fp16[i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_sqrt_ph(__m512h src, __mmask32 k,
                                __m512h a)

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sqrt_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_sqrt_round_ph(__m512h src, __mmask32 k,
                                      __m512h a,
                                      const int rounding)

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_maskz_sqrt_ph(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sqrt_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_sqrt_round_ph(__mmask32 k, __m512h a,
                                       const int rounding)

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_rcp_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_rcp_ph(__m512h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	dst.fp16[i] := (1.0 / a.fp16[i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_rcp_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_rcp_ph(__m512h src, __mmask32 k,
                               __m512h a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rcp_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_maskz_rcp_ph(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_sqrt_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k,
                                __m256d a)

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sqrt_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sqrt_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k,
                               __m256 a)

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sqrt_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rsqrt_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_rsqrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rsqrt_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_mask_rsqrt_ph(__m256h src, __mmask16 k,
                                 __m256h a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rsqrt_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_maskz_rsqrt_ph(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sqrt_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_sqrt_ph(__m256h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[i] := SQRT(a.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sqrt_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_mask_sqrt_ph(__m256h src, __mmask16 k,
                                __m256h a)

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sqrt_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_maskz_sqrt_ph(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rcp_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_rcp_ph(__m256h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[i] := (1.0 / a.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rcp_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_mask_rcp_ph(__m256h src, __mmask16 k,
                               __m256h a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rcp_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_maskz_rcp_ph(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_sqrt_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_sqrt_pd(__m128d src, __mmask8 k,
                             __m128d a)

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_sqrt_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SQRT(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_sqrt_ps(__m128 src, __mmask8 k, __m128 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_sqrt_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SQRT(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rcp14_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_rcp14_sd(__m128d src, __mmask8 k,
                              __m128d a, __m128d b)

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (1.0 / b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_rcp14_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_rcp14_sd(__mmask8 k, __m128d a,
                               __m128d b)

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (1.0 / b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_rcp14_sd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_rcp14_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (1.0 / b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_rcp14_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_rcp14_ss(__m128 src, __mmask8 k, __m128 a,
                             __m128 b)

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (1.0 / b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_rcp14_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_rcp14_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (1.0 / b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_rcp14_ss
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_rcp14_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (1.0 / b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt14_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_rsqrt14_sd(__m128d src, __mmask8 k,
                                __m128d a, __m128d b)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (1.0 / SQRT(b[63:0]))
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt14_sd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_rsqrt14_sd(__mmask8 k, __m128d a,
                                 __m128d b)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (1.0 / SQRT(b[63:0]))
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_rsqrt14_sd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_rsqrt14_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (1.0 / SQRT(b[63:0]))
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt14_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_rsqrt14_ss(__m128 src, __mmask8 k, __m128 a,
                               __m128 b)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (1.0 / SQRT(b[31:0]))
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt14_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_rsqrt14_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (1.0 / SQRT(b[31:0]))
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_rsqrt14_ss
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_rsqrt14_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (1.0 / SQRT(b[31:0]))
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_sqrt_round_sd(__m128d src, __mmask8 k,
                                   __m128d a, __m128d b,
                                   int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := SQRT(b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_sqrt_sd(__m128d src, __mmask8 k, __m128d a,
                             __m128d b)

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := SQRT(b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_round_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_sqrt_round_sd(__mmask8 k, __m128d a,
                                    __m128d b, int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := SQRT(b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_sqrt_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := SQRT(b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_sqrt_round_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_sqrt_round_sd(__m128d a, __m128d b,
                              int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := SQRT(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_sqrt_round_ss(__m128 src, __mmask8 k,
                                  __m128 a, __m128 b,
                                  int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := SQRT(b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_sqrt_ss(__m128 src, __mmask8 k, __m128 a,
                            __m128 b)

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := SQRT(b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_round_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_sqrt_round_ss(__mmask8 k, __m128 a,
                                   __m128 b, int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := SQRT(b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_sqrt_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := SQRT(b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_sqrt_round_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_sqrt_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := SQRT(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_rsqrt_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_rsqrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_mask_rsqrt_ph(__m128h src, __mmask8 k,
                              __m128h a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_maskz_rsqrt_ph(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sqrt_ph
^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_sqrt_ph(__m128h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[i] := SQRT(a.fp16[i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_mask_sqrt_ph(__m128h src, __mmask8 k,
                             __m128h a)

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_maskz_sqrt_ph(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := SQRT(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rcp_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_rcp_ph(__m128h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[i] := (1.0 / a.fp16[i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rcp_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_mask_rcp_ph(__m128h src, __mmask8 k, __m128h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rcp_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_maskz_rcp_ph(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := (1.0 / a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rsqrt_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_rsqrt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_rsqrt_sh(__m128h src, __mmask8 k,
                              __m128h a, __m128h b)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_rsqrt_sh(__mmask8 k, __m128h a,
                               __m128h b)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_sqrt_sh
^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_sqrt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := SQRT(b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_sqrt_round_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_sqrt_round_sh(__m128h a, __m128h b,
                              const int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := SQRT(b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_sqrt_sh(__m128h src, __mmask8 k, __m128h a,
                             __m128h b)

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := SQRT(b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_sqrt_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_sqrt_round_sh(__m128h src, __mmask8 k,
                                   __m128h a, __m128h b,
                                   const int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := SQRT(b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_sqrt_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := SQRT(b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_sqrt_round_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_sqrt_round_sh(__mmask8 k, __m128h a,
                                    __m128h b,
                                    const int rounding)

.. admonition:: Intel Description

    Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := SQRT(b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_rcp_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_rcp_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (1.0 / b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_rcp_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_rcp_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (1.0 / b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_rcp_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX-512-Elementary Math Functions-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_rcp_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (1.0 / b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

Arithmetic
----------
ZMM
~~~
_mm512_abs_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m512i _mm512_abs_epi8(__m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := ABS(a[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_mask_abs_epi8(__m512i src, __mmask64 k,
                                 __m512i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_abs_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_maskz_abs_epi8(__mmask64 k, __m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_abs_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m512i _mm512_abs_epi16(__m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := ABS(a[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_mask_abs_epi16(__m512i src, __mmask32 k,
                                  __m512i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_abs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_maskz_abs_epi16(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_add_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := a[i+7:i] + b[i+7:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_add_epi8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_add_epi8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_adds_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_adds_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_adds_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_adds_epi8(__m512i src, __mmask64 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_adds_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_adds_epi8(__mmask64 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_adds_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_adds_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_adds_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_adds_epi16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_adds_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_adds_epi16(__mmask32 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_adds_epu8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_adds_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_adds_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_adds_epu8(__m512i src, __mmask64 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_adds_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_adds_epu8(__mmask64 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_adds_epu16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_adds_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_adds_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_adds_epu16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_adds_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_adds_epu16(__mmask32 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_add_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := a[i+15:i] + b[i+15:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_add_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_add_epi16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_avg_epu8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_avg_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_avg_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_avg_epu8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_avg_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_avg_epu8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_avg_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_avg_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_avg_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_avg_epu16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_avg_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_avg_epu16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maddubs_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_maddubs_epi16(__m512i src, __mmask32 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_maddubs_epi16(__mmask32 k, __m512i a,
                                       __m512i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_madd_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_madd_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_madd_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_madd_epi16(__m512i src, __mmask16 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_madd_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_madd_epi16(__mmask16 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_max_epi8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_max_epi8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_max_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_max_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_max_epi16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_max_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_max_epu8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_max_epu8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epu8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_max_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_max_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_max_epu16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_max_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_max_epu16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_max_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_max_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_min_epi8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_min_epi8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_min_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_min_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_min_epi16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_min_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_min_epu8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_min_epu8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epu8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_min_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_min_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_min_epu16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_min_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_min_epu16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_min_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_min_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_mulhrs_epi16(__m512i src, __mmask32 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_mulhrs_epi16(__mmask32 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mulhrs_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        	dst[i+15:i] := tmp[16:1]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_mulhi_epu16(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_mulhi_epu16(__mmask32 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mulhi_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mulhi_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_mulhi_epi16(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_mulhi_epi16(__mmask32 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mulhi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mulhi_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mullo_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_mullo_epi16(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mullo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_mullo_epi16(__mmask32 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mullo_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mullo_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[15:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_sub_epi8(__m512i src, __mmask64 k,
                                 __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_sub_epi8(__mmask64 k, __m512i a,
                                  __m512i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_sub_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := a[i+7:i] - b[i+7:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_subs_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_subs_epi8(__m512i src, __mmask64 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_subs_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_subs_epi8(__mmask64 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_subs_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_subs_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_subs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_subs_epi16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_subs_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_subs_epi16(__mmask32 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_subs_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_subs_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_subs_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_subs_epu8(__m512i src, __mmask64 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_subs_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_subs_epu8(__mmask64 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_subs_epu8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_subs_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_subs_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_subs_epu16(__m512i src, __mmask32 k,
                                   __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_subs_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_subs_epu16(__mmask32 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_subs_epu16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_subs_epu16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_sub_epi16(__m512i src, __mmask32 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_sub_epi16(__mmask32 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_sub_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := a[i+15:i] - b[i+15:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mullo_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_mullo_epi64(__m512i src, __mmask8 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mullo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_mullo_epi64(__mmask8 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mullo_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mullo_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp[127:0] := a[i+63:i] * b[i+63:i]
        	dst[i+63:i] := tmp[63:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mullo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_mullo_epi32(__mmask16 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_add_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_add_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int rounding)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_add_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_add_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int rounding)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_div_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	dst[i+63:i] := a[i+63:i] / b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_div_round_pd(__m512d a, __m512d b,
                                int rounding)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", =and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	dst[i+63:i] := a[i+63:i] / b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_div_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_div_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int rounding)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_div_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_div_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int rounding)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_div_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := a[i+31:i] / b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_div_round_ps(__m512 a, __m512 b,
                               int rounding)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := a[i+31:i] / b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_div_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_div_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int rounding)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_div_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_div_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int rounding)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fmadd_pd(__mmask8 k, __m512d a,
                                  __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fmadd_round_pd(__mmask8 k, __m512d a,
                                        __m512d b, __m512d c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fmadd_ps(__mmask16 k, __m512 a,
                                 __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fmadd_round_ps(__mmask16 k, __m512 a,
                                       __m512 b, __m512 c,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "a" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fmaddsub_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fmaddsub_round_pd(__m512d a, __m512d b,
                                     __m512d c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fmaddsub_pd(__m512d a, __m512d b,
                                     __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fmaddsub_round_pd(__m512d a, __m512d b,
                                           __m512d c,
                                           __mmask8 k,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE 
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fmaddsub_pd(__m512d a, __mmask8 k,
                                    __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fmaddsub_round_pd(__m512d a, __mmask8 k,
                                          __m512d b, __m512d c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fmaddsub_pd(__mmask8 k, __m512d a,
                                     __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fmaddsub_round_pd(__mmask8 k,
                                           __m512d a, __m512d b,
                                           __m512d c,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fmaddsub_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fmaddsub_round_ps(__m512 a, __m512 b,
                                    __m512 c,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fmaddsub_ps(__m512 a, __m512 b,
                                    __m512 c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    const int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fmaddsub_round_ps(__m512 a, __m512 b,
                                          __m512 c, __mmask16 k,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fmaddsub_ps(__m512 a, __mmask16 k,
                                   __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fmaddsub_round_ps(__m512 a, __mmask16 k,
                                         __m512 b, __m512 c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fmaddsub_ps(__mmask16 k, __m512 a,
                                    __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fmaddsub_round_ps(__mmask16 k, __m512 a,
                                          __m512 b, __m512 c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fmsub_pd(__mmask8 k, __m512d a,
                                  __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fmsub_round_pd(__mmask8 k, __m512d a,
                                        __m512d b, __m512d c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fmsub_ps(__mmask16 k, __m512 a,
                                 __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fmsub_round_ps(__mmask16 k, __m512 a,
                                       __m512 b, __m512 c,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fmsubadd_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fmsubadd_round_pd(__m512d a, __m512d b,
                                     __m512d c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fmsubadd_pd(__m512d a, __m512d b,
                                     __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fmsubadd_round_pd(__m512d a, __m512d b,
                                           __m512d c,
                                           __mmask8 k,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fmsubadd_pd(__m512d a, __mmask8 k,
                                    __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fmsubadd_round_pd(__m512d a, __mmask8 k,
                                          __m512d b, __m512d c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fmsubadd_pd(__mmask8 k, __m512d a,
                                     __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fmsubadd_round_pd(__mmask8 k,
                                           __m512d a, __m512d b,
                                           __m512d c,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fmsubadd_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fmsubadd_round_ps(__m512 a, __m512 b,
                                    __m512 c,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fmsubadd_ps(__m512 a, __m512 b,
                                    __m512 c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    const int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fmsubadd_round_ps(__m512 a, __m512 b,
                                          __m512 c, __mmask16 k,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fmsubadd_ps(__m512 a, __mmask16 k,
                                   __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fmsubadd_round_ps(__m512 a, __mmask16 k,
                                         __m512 b, __m512 c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fmsubadd_ps(__mmask16 k, __m512 a,
                                    __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fmsubadd_round_ps(__mmask16 k, __m512 a,
                                          __m512 b, __m512 c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fnmadd_pd(__mmask8 k, __m512d a,
                                   __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fnmadd_round_pd(__mmask8 k, __m512d a,
                                         __m512d b, __m512d c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fnmadd_ps(__mmask16 k, __m512 a,
                                  __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fnmadd_round_ps(__mmask16 k, __m512 a,
                                        __m512 b, __m512 c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_maskz_fnmsub_pd(__mmask8 k, __m512d a,
                                   __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512d c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_fnmsub_round_pd(__mmask8 k, __m512d a,
                                         __m512d b, __m512d c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_maskz_fnmsub_ps(__mmask16 k, __m512 a,
                                  __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512 c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_fnmsub_round_ps(__mmask16 k, __m512 a,
                                        __m512 b, __m512 c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_mul_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_mul_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_mul_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_mul_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_add_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_add_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_add_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_add_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_mul_epi32(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_maskz_mul_epi32(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mul_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_mul_epu32(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_mul_epu32(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mul_epu32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_sub_epi32(__mmask16 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_mask_sub_epi64(__m512i src, __mmask8 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_maskz_sub_epi64(__mmask8 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m512i _mm512_sub_epi64(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_sub_pd(__mmask8 k, __m512d a,
                                __m512d b)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_sub_round_pd(__mmask8 k, __m512d a,
                                      __m512d b, int rounding)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_sub_ps(__mmask16 k, __m512 a, __m512 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_sub_round_ps(__mmask16 k, __m512 a,
                                     __m512 b, int rounding)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_add_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_add_round_pd(__m512d a, __m512d b,
                                int rounding)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_add_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_add_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int rounding)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_add_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_add_round_ps(__m512 a, __m512 b,
                               int rounding)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_add_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_add_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int rounding)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fmadd_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fmadd_round_pd(__m512d a, __m512d b,
                                  __m512d c, int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fmadd_pd(__m512d a, __m512d b,
                                  __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fmadd_round_pd(__m512d a, __m512d b,
                                        __m512d c, __mmask8 k,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE 
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fmadd_pd(__m512d a, __mmask8 k,
                                 __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fmadd_round_pd(__m512d a, __mmask8 k,
                                       __m512d b, __m512d c,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fmadd_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fmadd_round_ps(__m512 a, __m512 b, __m512 c,
                                 int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fmadd_ps(__m512 a, __m512 b, __m512 c,
                                 __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fmadd_round_ps(__m512 a, __m512 b,
                                       __m512 c, __mmask16 k,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fmadd_ps(__m512 a, __mmask16 k, __m512 b,
                                __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fmadd_round_ps(__m512 a, __mmask16 k,
                                      __m512 b, __m512 c,
                                      int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fmsub_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fmsub_round_pd(__m512d a, __m512d b,
                                  __m512d c, int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fmsub_pd(__m512d a, __m512d b,
                                  __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fmsub_round_pd(__m512d a, __m512d b,
                                        __m512d c, __mmask8 k,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fmsub_pd(__m512d a, __mmask8 k,
                                 __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fmsub_round_pd(__m512d a, __mmask8 k,
                                       __m512d b, __m512d c,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fmsub_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fmsub_round_ps(__m512 a, __m512 b, __m512 c,
                                 int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fmsub_ps(__m512 a, __m512 b, __m512 c,
                                 __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fmsub_round_ps(__m512 a, __m512 b,
                                       __m512 c, __mmask16 k,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fmsub_ps(__m512 a, __mmask16 k, __m512 b,
                                __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fmsub_round_ps(__m512 a, __mmask16 k,
                                      __m512 b, __m512 c,
                                      int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fnmadd_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fnmadd_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fnmadd_round_pd(__m512d a, __m512d b,
                                   __m512d c, int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fnmadd_pd(__m512d a, __m512d b,
                                   __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fnmadd_round_pd(__m512d a, __m512d b,
                                         __m512d c, __mmask8 k,
                                         int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fnmadd_pd(__m512d a, __mmask8 k,
                                  __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fnmadd_round_pd(__m512d a, __mmask8 k,
                                        __m512d b, __m512d c,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmadd_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fnmadd_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fnmadd_round_ps(__m512 a, __m512 b, __m512 c,
                                  int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".  
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fnmadd_ps(__m512 a, __m512 b, __m512 c,
                                  __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fnmadd_round_ps(__m512 a, __m512 b,
                                        __m512 c, __mmask16 k,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fnmadd_ps(__m512 a, __mmask16 k,
                                 __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fnmadd_round_ps(__m512 a, __mmask16 k,
                                       __m512 b, __m512 c,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmsub_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_fnmsub_pd(__m512d a, __m512d b, __m512d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_fnmsub_round_pd(__m512d a, __m512d b,
                                   __m512d c, int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".  
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m512d _mm512_mask3_fnmsub_pd(__m512d a, __m512d b,
                                   __m512d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask3_fnmsub_round_pd(__m512d a, __m512d b,
                                         __m512d c, __mmask8 k,
                                         int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m512d _mm512_mask_fnmsub_pd(__m512d a, __mmask8 k,
                                  __m512d b, __m512d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_fnmsub_round_pd(__m512d a, __mmask8 k,
                                        __m512d b, __m512d c,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmsub_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_fnmsub_ps(__m512 a, __m512 b, __m512 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_fnmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_fnmsub_round_ps(__m512 a, __m512 b, __m512 c,
                                  int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m512 _mm512_mask3_fnmsub_ps(__m512 a, __m512 b, __m512 c,
                                  __mmask16 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512 c, 
    __mmask16 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask3_fnmsub_round_ps(__m512 a, __m512 b,
                                        __m512 c, __mmask16 k,
                                        int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).  [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m512 _mm512_mask_fnmsub_ps(__m512 a, __mmask16 k,
                                 __m512 b, __m512 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_fnmsub_round_ps(__m512 a, __mmask16 k,
                                       __m512 b, __m512 c,
                                       int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:512] := 0
        	

_mm512_mask_mul_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_mul_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  RM.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_mul_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mul_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] * b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mul_round_pd(__m512d a, __m512d b,
                                int rounding)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] * b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_mul_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  RM.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_mul_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mul_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mul_round_ps(__m512 a, __m512 b,
                               int rounding)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_add_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_add_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mullo_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_mullo_epi32(__m512i src, __mmask16 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mullo_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mullo_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	tmp[63:0] := a[i+31:i] * b[i+31:i]
        	dst[i+31:i] := tmp[31:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_mask_sub_epi32(__m512i src, __mmask16 k,
                                  __m512i a, __m512i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_sub_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_sub_pd(__m512d src, __mmask8 k,
                               __m512d a, __m512d b)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_sub_round_pd(__m512d src, __mmask8 k,
                                     __m512d a, __m512d b,
                                     int rounding)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_sub_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_round_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_sub_round_pd(__m512d a, __m512d b,
                                int rounding)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_sub_ps(__m512 src, __mmask16 k, __m512 a,
                              __m512 b)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_sub_round_ps(__m512 src, __mmask16 k,
                                    __m512 a, __m512 b,
                                    int rounding)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_sub_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_round_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_sub_round_ps(__m512 a, __m512 b,
                               int rounding)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_add_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    int _mm512_mask_reduce_add_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[31:0] + src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0
        	FI
        ENDFOR
        dst[31:0] := REDUCE_ADD(tmp, 16)
        	

_mm512_mask_reduce_add_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_add_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[63:0] + src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0
        	FI
        ENDFOR
        dst[63:0] := REDUCE_ADD(tmp, 8)
        	

_mm512_mask_reduce_add_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    double _mm512_mask_reduce_add_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[63:0] + src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 0
        	FI
        ENDFOR
        dst[63:0] := REDUCE_ADD(tmp, 8)
        	

_mm512_mask_reduce_add_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    float _mm512_mask_reduce_add_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[31:0] + src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 0
        	FI
        ENDFOR
        dst[31:0] := REDUCE_ADD(tmp, 16)
        	

_mm512_mask_reduce_mul_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    int _mm512_mask_reduce_mul_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[31:0] * src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := 1
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MUL(tmp, 16)
        	

_mm512_mask_reduce_mul_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __int64 _mm512_mask_reduce_mul_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[63:0] * src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 1
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MUL(tmp, 8)
        	

_mm512_mask_reduce_mul_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    double _mm512_mask_reduce_mul_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[63:0] * src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[64*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 8
        	i := j*64
        	IF k[j]
        		tmp[i+63:i] := a[i+63:i]
        	ELSE
        		tmp[i+63:i] := 1.0
        	FI
        ENDFOR
        dst[63:0] := REDUCE_MUL(tmp, 8)
        	

_mm512_mask_reduce_mul_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    float _mm512_mask_reduce_mul_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[31:0] * src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[32*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 16
        	i := j*32
        	IF k[j]
        		tmp[i+31:i] := a[i+31:i]
        	ELSE
        		tmp[i+31:i] := FP32(1.0)
        	FI
        ENDFOR
        dst[31:0] := REDUCE_MUL(tmp, 16)
        	

_mm512_reduce_add_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm512_reduce_add_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[31:0] + src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_ADD(a, 16)
        	

_mm512_reduce_add_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm512_reduce_add_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[63:0] + src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_ADD(a, 8)
        	

_mm512_reduce_add_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm512_reduce_add_pd(__m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[63:0] + src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_ADD(a, 8)
        	

_mm512_reduce_add_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm512_reduce_add_ps(__m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[31:0] + src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_ADD(a, 16)
        	

_mm512_reduce_mul_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm512_reduce_mul_epi32(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 32-bit integers in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[31:0] * src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MUL(a, 16)
        	

_mm512_reduce_mul_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __int64
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __int64 _mm512_reduce_mul_epi64(__m512i a);

.. admonition:: Intel Description

    Reduce the packed 64-bit integers in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[63:0] * src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MUL(a, 8)
        	

_mm512_reduce_mul_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm512_reduce_mul_pd(__m512d a);

.. admonition:: Intel Description

    Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[63:0] * src[127:64]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*64
        		src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[64*len-1:0], len)
        }
        dst[63:0] := REDUCE_MUL(a, 8)
        	

_mm512_reduce_mul_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm512_reduce_mul_ps(__m512 a);

.. admonition:: Intel Description

    Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[31:0] * src[63:32]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*32
        		src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[32*len-1:0], len)
        }
        dst[31:0] := REDUCE_MUL(a, 16)
        	

_mm512_abs_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 v2
:Param ETypes:
    FP32 v2

.. code-block:: C

    __m512 _mm512_abs_ps(__m512 v2);

.. admonition:: Intel Description

    Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ABS(v2[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 v2
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 v2

.. code-block:: C

    __m512 _mm512_mask_abs_ps(__m512 src, __mmask16 k,
                              __m512 v2)

.. admonition:: Intel Description

    Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(v2[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_abs_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d v2
:Param ETypes:
    FP64 v2

.. code-block:: C

    __m512d _mm512_abs_pd(__m512d v2);

.. admonition:: Intel Description

    Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ABS(v2[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_abs_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d v2
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 v2

.. code-block:: C

    __m512d _mm512_mask_abs_pd(__m512d src, __mmask8 k,
                               __m512d v2)

.. admonition:: Intel Description

    Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(v2[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_madd52lo_epu64(__m512i a, __m512i b,
                                  __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_mask_madd52lo_epu64(__m512i a, __mmask8 k,
                                       __m512i b, __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_maskz_madd52lo_epu64(__mmask8 k, __m512i a,
                                        __m512i b, __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_madd52hi_epu64(__m512i a, __m512i b,
                                  __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask8 k, 
    __m512i b, 
    __m512i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_mask_madd52hi_epu64(__m512i a, __mmask8 k,
                                       __m512i b, __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    __m512i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m512i _mm512_maskz_madd52hi_epu64(__mmask8 k, __m512i a,
                                        __m512i b, __m512i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_dpbf16_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __m512bh a, 
    __m512bh b
:Param ETypes:
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m512 _mm512_dpbf16_ps(__m512 src, __m512bh a, __m512bh b);

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 15
        	dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        	dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dpbf16_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512bh a, 
    __m512bh b
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m512 _mm512_mask_dpbf16_ps(__m512 src, __mmask16 k,
                                 __m512bh a, __m512bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dpbf16_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 src, 
    __m512bh a, 
    __m512bh b
:Param ETypes:
    MASK k, 
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m512 _mm512_maskz_dpbf16_ps(__mmask16 k, __m512 src,
                                  __m512bh a, __m512bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_add_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := a.fp16[j] + b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_add_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_add_ph(__mmask32 k, __m512h a,
                                __m512h b)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_add_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_add_round_ph(__m512h a, __m512h b,
                                int rounding)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := a.fp16[j] + b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_add_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_add_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int rounding)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_add_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_add_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int rounding)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_div_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := a.fp16[j] / b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_div_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_div_ph(__mmask32 k, __m512h a,
                                __m512h b)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_div_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_div_round_ph(__m512h a, __m512h b,
                                int rounding)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := a.fp16[j] / b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_div_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_div_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int rounding)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_div_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_div_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int rounding)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fmadd_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fmadd_ph(__m512h a, __mmask32 k,
                                 __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fmadd_ph(__m512h a, __m512h b,
                                  __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fmadd_ph(__mmask32 k, __m512h a,
                                  __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmadd_round_ph(__m512h a, __m512h b,
                                  __m512h c,
                                  const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmadd_round_ph(__m512h a, __mmask32 k,
                                       __m512h b, __m512h c,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fmadd_round_ph(__m512h a, __m512h b,
                                        __m512h c, __mmask32 k,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmadd_round_ph(__mmask32 k, __m512h a,
                                        __m512h b, __m512h c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fnmadd_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fnmadd_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fnmadd_ph(__m512h a, __mmask32 k,
                                  __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fnmadd_ph(__m512h a, __m512h b,
                                   __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fnmadd_ph(__mmask32 k, __m512h a,
                                   __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fnmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fnmadd_round_ph(__m512h a, __m512h b,
                                   __m512h c,
                                   const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fnmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fnmadd_round_ph(__m512h a, __mmask32 k,
                                        __m512h b, __m512h c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fnmadd_round_ph(__m512h a, __m512h b,
                                         __m512h c, __mmask32 k,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fnmadd_round_ph(__mmask32 k, __m512h a,
                                         __m512h b, __m512h c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fmsub_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fmsub_ph(__m512h a, __mmask32 k,
                                 __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fmsub_ph(__m512h a, __m512h b,
                                  __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fmsub_ph(__mmask32 k, __m512h a,
                                  __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmsub_round_ph(__m512h a, __m512h b,
                                  __m512h c,
                                  const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmsub_round_ph(__m512h a, __mmask32 k,
                                       __m512h b, __m512h c,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fmsub_round_ph(__m512h a, __m512h b,
                                        __m512h c, __mmask32 k,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmsub_round_ph(__mmask32 k, __m512h a,
                                        __m512h b, __m512h c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fnmsub_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fnmsub_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fnmsub_ph(__m512h a, __mmask32 k,
                                  __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fnmsub_ph(__m512h a, __m512h b,
                                   __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fnmsub_ph(__mmask32 k, __m512h a,
                                   __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fnmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fnmsub_round_ph(__m512h a, __m512h b,
                                   __m512h c,
                                   const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fnmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fnmsub_round_ph(__m512h a, __mmask32 k,
                                        __m512h b, __m512h c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fnmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fnmsub_round_ph(__m512h a, __m512h b,
                                         __m512h c, __mmask32 k,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fnmsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fnmsub_round_ph(__mmask32 k, __m512h a,
                                         __m512h b, __m512h c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fmaddsub_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fmaddsub_ph(__m512h a, __mmask32 k,
                                    __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fmaddsub_ph(__m512h a, __m512h b,
                                     __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fmaddsub_ph(__mmask32 k, __m512h a,
                                     __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmaddsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmaddsub_round_ph(__m512h a, __m512h b,
                                     __m512h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmaddsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmaddsub_round_ph(__m512h a,
                                          __mmask32 k,
                                          __m512h b, __m512h c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmaddsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fmaddsub_round_ph(__m512h a, __m512h b,
                                           __m512h c,
                                           __mmask32 k,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmaddsub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmaddsub_round_ph(__mmask32 k,
                                           __m512h a, __m512h b,
                                           __m512h c,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fmsubadd_ph(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fmsubadd_ph(__m512h a, __mmask32 k,
                                    __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fmsubadd_ph(__m512h a, __m512h b,
                                     __m512h c, __mmask32 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fmsubadd_ph(__mmask32 k, __m512h a,
                                     __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmsubadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmsubadd_round_ph(__m512h a, __m512h b,
                                     __m512h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmsubadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask32 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmsubadd_round_ph(__m512h a,
                                          __mmask32 k,
                                          __m512h b, __m512h c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmsubadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask32 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fmsubadd_round_ph(__m512h a, __m512h b,
                                           __m512h c,
                                           __mmask32 k,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmsubadd_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmsubadd_round_ph(__mmask32 k,
                                           __m512h a, __m512h b,
                                           __m512h c,
                                           const int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_sub_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := a.fp16[j] - b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_sub_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_sub_round_ph(__m512h a, __m512h b,
                                int rounding)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := a.fp16[j] - b.fp16[j]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_sub_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_sub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_sub_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int rounding)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_sub_ph(__mmask32 k, __m512h a,
                                __m512h b)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_sub_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_sub_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int rounding)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mul_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	dst.fp16[i] := a.fp16[i] * b.fp16[i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_round_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mul_round_ph(__m512h a, __m512h b,
                                int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	dst.fp16[i] := a.fp16[i] * b.fp16[i]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_mul_ph(__m512h src, __mmask32 k,
                               __m512h a, __m512h b)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_mul_round_ph(__m512h src, __mmask32 k,
                                     __m512h a, __m512h b,
                                     int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_mul_ph(__mmask32 k, __m512h a,
                                __m512h b)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_mul_round_ph(__mmask32 k, __m512h a,
                                      __m512h b, int rounding)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmul_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_fmul_pch(__m512h a, __m512h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_pch
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mul_pch(__m512h a, __m512h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_fmul_pch(__m512h src, __mmask16 k,
                                 __m512h a, __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_mul_pch(__m512h src, __mmask16 k,
                                __m512h a, __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_fmul_pch(__mmask16 k, __m512h a,
                                  __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_mul_pch(__mmask16 k, __m512h a,
                                 __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmul_round_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmul_round_pch(__m512h a, __m512h b,
                                  const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mul_round_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mul_round_pch(__m512h a, __m512h b,
                                 const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmul_round_pch(__m512h src, __mmask16 k,
                                       __m512h a, __m512h b,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_mul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_mul_round_pch(__m512h src, __mmask16 k,
                                      __m512h a, __m512h b,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmul_round_pch(__mmask16 k, __m512h a,
                                        __m512h b,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_mul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_mul_round_pch(__mmask16 k, __m512h a,
                                       __m512h b,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fcmul_pch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_fcmul_pch(__m512h a, __m512h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cmul_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_cmul_pch(__m512h a, __m512h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fcmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_fcmul_pch(__m512h src, __mmask16 k,
                                  __m512h a, __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cmul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_cmul_pch(__m512h src, __mmask16 k,
                                 __m512h a, __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fcmul_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_fcmul_pch(__mmask16 k, __m512h a,
                                   __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_cmul_pch(__mmask16 k, __m512h a,
                                  __m512h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fcmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fcmul_round_pch(__m512h a, __m512h b,
                                   const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cmul_round_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_cmul_round_pch(__m512h a, __m512h b,
                                  const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fcmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fcmul_round_pch(__m512h src,
                                        __mmask16 k, __m512h a,
                                        __m512h b,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_cmul_round_pch(__m512h src, __mmask16 k,
                                       __m512h a, __m512h b,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fcmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fcmul_round_pch(__mmask16 k, __m512h a,
                                         __m512h b,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cmul_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_cmul_round_pch(__mmask16 k, __m512h a,
                                        __m512h b,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_pch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fmadd_pch(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask16 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fmadd_pch(__m512h a, __mmask16 k,
                                  __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "src", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fmadd_pch(__m512h a, __m512h b,
                                   __m512h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "src", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fmadd_pch(__mmask16 k, __m512h a,
                                   __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fmadd_round_pch(__m512h a, __m512h b,
                                   __m512h c,
                                   const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask16 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fmadd_round_pch(__m512h a, __mmask16 k,
                                        __m512h b, __m512h c,
                                        const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask16 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fmadd_round_pch(__m512h a, __m512h b,
                                         __m512h c, __mmask16 k,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fmadd_round_pch(__mmask16 k, __m512h a,
                                         __m512h b, __m512h c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fcmadd_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_fcmadd_pch(__m512h a, __m512h b, __m512h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask16 k, 
    __m512h b, 
    __m512h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_mask_fcmadd_pch(__m512h a, __mmask16 k,
                                   __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m512h _mm512_mask3_fcmadd_pch(__m512h a, __m512h b,
                                    __m512h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    __m512h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m512h _mm512_maskz_fcmadd_pch(__mmask16 k, __m512h a,
                                    __m512h b, __m512h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fcmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_fcmadd_round_pch(__m512h a, __m512h b,
                                    __m512h c,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fcmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __mmask16 k, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_fcmadd_round_pch(__m512h a, __mmask16 k,
                                         __m512h b, __m512h c,
                                         const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask3_fcmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    __m512h c, 
    __mmask16 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask3_fcmadd_round_pch(__m512h a, __m512h b,
                                          __m512h c,
                                          __mmask16 k,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fcmadd_round_pch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a, 
    __m512h b, 
    __m512h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_fcmadd_round_pch(__mmask16 k,
                                          __m512h a, __m512h b,
                                          __m512h c,
                                          const int rounding)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_add_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: _Float16
:Param Types:
    __m512h a
:Param ETypes:
    FP32 a

.. code-block:: C

    _Float16 _mm512_reduce_add_ph(__m512h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 15
        	tmp.fp16[i] := tmp.fp16[i] + a.fp16[i+16]
        ENDFOR
        FOR i := 0 to 7
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+8]
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
        	

_mm512_reduce_mul_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: _Float16
:Param Types:
    __m512h a
:Param ETypes:
    FP32 a

.. code-block:: C

    _Float16 _mm512_reduce_mul_ph(__m512h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 15
        	tmp.fp16[i] := tmp.fp16[i] * a.fp16[i+16]
        ENDFOR
        FOR i := 0 to 7
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+8]
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
        	

_mm512_reduce_max_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: _Float16
:Param Types:
    __m512h a
:Param ETypes:
    FP32 a

.. code-block:: C

    _Float16 _mm512_reduce_max_ph(__m512h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 15
        	tmp.fp16[i] := (a.fp16[i] > a.fp16[i+16] ? a.fp16[i] : a.fp16[i+16])
        ENDFOR
        FOR i := 0 to 7
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm512_reduce_min_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: _Float16
:Param Types:
    __m512h a
:Param ETypes:
    FP32 a

.. code-block:: C

    _Float16 _mm512_reduce_min_ph(__m512h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 15
        	tmp.fp16[i] := (a.fp16[i] < a.fp16[i+16] ? tmp.fp16[i] : a.fp16[i+16])
        ENDFOR
        FOR i := 0 to 7
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm512_abs_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h v2
:Param ETypes:
    FP16 v2

.. code-block:: C

    __m512h _mm512_abs_ph(__m512h v2);

.. admonition:: Intel Description

    Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	dst.fp16[j] := ABS(v2.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_conj_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512h _mm512_conj_pch(__m512h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_conj_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask16 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_conj_pch(__m512h src, __mmask16 k,
                                 __m512h a)

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_conj_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask16 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_maskz_conj_pch(__mmask16 k, __m512h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_dpwssds_epi32(__mmask16 k, __m512i src,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_dpwssds_epi32(__m512i src, __mmask16 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_dpwssds_epi32(__m512i src, __m512i a,
                                 __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_dpwssd_epi32(__mmask16 k, __m512i src,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_dpwssd_epi32(__m512i src, __mmask16 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_dpwssd_epi32(__m512i src, __m512i a,
                                __m512i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_dpbusds_epi32(__mmask16 k, __m512i src,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_dpbusds_epi32(__m512i src, __mmask16 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_dpbusds_epi32(__m512i src, __m512i a,
                                 __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_maskz_dpbusd_epi32(__mmask16 k, __m512i src,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_mask_dpbusd_epi32(__m512i src, __mmask16 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m512i _mm512_dpbusd_epi32(__m512i src, __m512i a,
                                __m512i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_mask_abs_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_mask_abs_epi8(__m256i src, __mmask32 k,
                                 __m256i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_abs_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_maskz_abs_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_abs_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_mask_abs_epi16(__m256i src, __mmask16 k,
                                  __m256i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_abs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_maskz_abs_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_add_epi8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_add_epi8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_adds_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_adds_epi8(__m256i src, __mmask32 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_adds_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_adds_epi8(__mmask32 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_adds_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_adds_epi16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_adds_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_adds_epi16(__mmask16 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_adds_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_adds_epu8(__m256i src, __mmask32 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_adds_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_adds_epu8(__mmask32 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_adds_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_adds_epu16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_adds_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_adds_epu16(__mmask16 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_add_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_add_epi16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_avg_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_avg_epu8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_avg_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_avg_epu8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_avg_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_avg_epu16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_avg_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_avg_epu16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_maddubs_epi16(__m256i src, __mmask16 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_maddubs_epi16(__mmask16 k, __m256i a,
                                       __m256i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_madd_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_madd_epi16(__m256i src, __mmask8 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_madd_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_madd_epi16(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_max_epi8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_max_epi8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_max_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_max_epi16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_max_epu8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_max_epu8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_max_epu16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_max_epu16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_min_epi8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_min_epi8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_min_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_min_epi16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_min_epu8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_min_epu8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_min_epu16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_min_epu16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_mulhrs_epi16(__m256i src, __mmask16 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_mulhrs_epi16(__mmask16 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_mulhi_epu16(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_mulhi_epu16(__mmask16 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_mulhi_epi16(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_mulhi_epi16(__mmask16 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mullo_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_mullo_epi16(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mullo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_mullo_epi16(__mmask16 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_sub_epi8(__m256i src, __mmask32 k,
                                 __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_sub_epi8(__mmask32 k, __m256i a,
                                  __m256i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_subs_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_subs_epi8(__m256i src, __mmask32 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_subs_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_subs_epi8(__mmask32 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_subs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_subs_epi16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_subs_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_subs_epi16(__mmask16 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_subs_epu8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_subs_epu8(__m256i src, __mmask32 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_subs_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_subs_epu8(__mmask32 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_subs_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_subs_epu16(__m256i src, __mmask16 k,
                                   __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_subs_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_subs_epu16(__mmask16 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_sub_epi16(__m256i src, __mmask16 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_sub_epi16(__mmask16 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_reduce_add_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_add_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[15:0] + src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] + src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_ADD(a, 16)
        	

_mm256_mask_reduce_add_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_add_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[15:0] + src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] + src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_ADD(tmp, 16)
        	

_mm256_reduce_add_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_add_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[7:0] + src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] + src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_ADD(a, 32)
        	

_mm256_mask_reduce_add_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_add_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[7:0] + src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] + src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_ADD(tmp, 32)
        	

_mm256_reduce_mul_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_mul_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[15:0] * src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] * src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MUL(a, 16)
        	

_mm256_mask_reduce_mul_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_mul_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[15:0] * src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] * src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 1
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MUL(tmp, 16)
        	

_mm256_reduce_mul_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_mul_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[7:0] * src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] * src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MUL(a, 32)
        	

_mm256_mask_reduce_mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_mul_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[7:0] * src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] * src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 1
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MUL(tmp, 32)
        	

_mm256_reduce_or_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_or_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[15:0] OR src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] OR src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_OR(a, 16)
        	

_mm256_mask_reduce_or_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_or_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[15:0] OR src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] OR src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_OR(tmp, 16)
        	

_mm256_reduce_or_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_or_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[7:0] OR src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] OR src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_OR(a, 32)
        	

_mm256_mask_reduce_or_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_or_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[7:0] OR src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] OR src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_OR(tmp, 32)
        	

_mm256_reduce_and_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm256_reduce_and_epi16(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[15:0] AND src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] AND src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_AND(a, 16)
        	

_mm256_mask_reduce_and_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: short
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm256_mask_reduce_and_epi16(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[15:0] AND src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] AND src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0xFFFF
        	FI
        ENDFOR
        dst[15:0] := REDUCE_AND(tmp, 16)
        	

_mm256_reduce_and_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm256_reduce_and_epi8(__m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[7:0] AND src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] AND src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_AND(a, 32)
        	

_mm256_mask_reduce_and_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: char
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm256_mask_reduce_and_epi8(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[7:0] AND src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] AND src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0xFF
        	FI
        ENDFOR
        dst[7:0] := REDUCE_AND(tmp, 32)
        	

_mm256_mask_mullo_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_mullo_epi64(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mullo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_mullo_epi64(__mmask8 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mullo_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mullo_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := a[i+63:i] * b[i+63:i]
        	dst[i+63:i] := tmp[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_add_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_add_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_add_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_add_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_div_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_div_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_div_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_div_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_div_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_div_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_div_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_div_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fmadd_pd(__m256d a, __m256d b,
                                  __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmadd_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fmadd_pd(__m256d a, __mmask8 k,
                                 __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fmadd_pd(__mmask8 k, __m256d a,
                                  __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fmadd_ps(__m256 a, __m256 b, __m256 c,
                                 __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmadd_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fmadd_ps(__m256 a, __mmask8 k, __m256 b,
                                __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fmadd_ps(__mmask8 k, __m256 a, __m256 b,
                                 __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fmaddsub_pd(__m256d a, __m256d b,
                                     __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fmaddsub_pd(__m256d a, __mmask8 k,
                                    __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fmaddsub_pd(__mmask8 k, __m256d a,
                                     __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fmaddsub_ps(__m256 a, __m256 b,
                                    __m256 c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fmaddsub_ps(__m256 a, __mmask8 k,
                                   __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fmaddsub_ps(__mmask8 k, __m256 a,
                                    __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fmsub_pd(__m256d a, __m256d b,
                                  __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsub_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fmsub_pd(__m256d a, __mmask8 k,
                                 __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fmsub_pd(__mmask8 k, __m256d a,
                                  __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fmsub_ps(__m256 a, __m256 b, __m256 c,
                                 __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsub_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fmsub_ps(__m256 a, __mmask8 k, __m256 b,
                                __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fmsub_ps(__mmask8 k, __m256 a, __m256 b,
                                 __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fmsubadd_pd(__m256d a, __m256d b,
                                     __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fmsubadd_pd(__m256d a, __mmask8 k,
                                    __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fmsubadd_pd(__mmask8 k, __m256d a,
                                     __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fmsubadd_ps(__m256 a, __m256 b,
                                    __m256 c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fmsubadd_ps(__m256 a, __mmask8 k,
                                   __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fmsubadd_ps(__mmask8 k, __m256 a,
                                    __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fnmadd_pd(__m256d a, __m256d b,
                                   __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fnmadd_pd(__m256d a, __mmask8 k,
                                  __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmadd_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fnmadd_pd(__mmask8 k, __m256d a,
                                   __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fnmadd_ps(__m256 a, __m256 b, __m256 c,
                                  __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fnmadd_ps(__m256 a, __mmask8 k, __m256 b,
                                 __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmadd_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fnmadd_ps(__mmask8 k, __m256 a,
                                  __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m256d _mm256_mask3_fnmsub_pd(__m256d a, __m256d b,
                                   __m256d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_mask_fnmsub_pd(__m256d a, __mmask8 k,
                                  __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmsub_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_maskz_fnmsub_pd(__mmask8 k, __m256d a,
                                   __m256d b, __m256d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m256 _mm256_mask3_fnmsub_ps(__m256 a, __m256 b, __m256 c,
                                  __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_mask_fnmsub_ps(__m256 a, __mmask8 k, __m256 b,
                                 __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmsub_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_maskz_fnmsub_ps(__mmask8 k, __m256 a,
                                  __m256 b, __m256 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_mask_max_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_max_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_max_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_max_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_max_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_min_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_min_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_min_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_min_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_mul_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_mul_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_mul_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  RM.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_mul_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_abs_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm256_mask_abs_epi32(__m256i src, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_abs_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm256_maskz_abs_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_abs_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m256i _mm256_abs_epi64(__m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ABS(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_abs_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m256i _mm256_mask_abs_epi64(__m256i src, __mmask8 k,
                                  __m256i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_abs_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m256i _mm256_maskz_abs_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_add_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_add_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_add_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_add_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] :=0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mask_max_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_maskz_max_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_mask_max_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_maskz_max_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_max_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_max_epu32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_max_epu32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_max_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_max_epu64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_max_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_max_epu64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_max_epu64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mask_min_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_maskz_min_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_mask_min_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_maskz_min_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_min_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_min_epu32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_min_epu32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_min_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_min_epu64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_min_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_min_epu64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_min_epu64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mask_mul_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_maskz_mul_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mullo_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_mullo_epi32(__m256i src, __mmask8 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mullo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_mullo_epi32(__mmask8 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_mul_epu32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_mul_epu32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_sub_epi32(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_sub_epi32(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_sub_epi64(__m256i src, __mmask8 k,
                                  __m256i a, __m256i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_sub_epi64(__mmask8 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rcp14_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_rcp14_pd(__m256d src, __mmask8 k,
                                 __m256d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rcp14_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_rcp14_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rcp14_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_rcp14_pd(__m256d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := (1.0 / a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rcp14_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_rcp14_ps(__m256 src, __mmask8 k,
                                __m256 a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rcp14_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_rcp14_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rcp14_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_rcp14_ps(__m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (1.0 / a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rsqrt14_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_rsqrt14_pd(__m256d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_rsqrt14_pd(__m256d src, __mmask8 k,
                                   __m256d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_rsqrt14_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rsqrt14_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_rsqrt14_ps(__m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_rsqrt14_ps(__m256 src, __mmask8 k,
                                  __m256 a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_rsqrt14_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_sub_pd(__m256d src, __mmask8 k,
                               __m256d a, __m256d b)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_sub_pd(__mmask8 k, __m256d a,
                                __m256d b)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_sub_ps(__m256 src, __mmask8 k, __m256 a,
                              __m256 b)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_sub_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_madd52lo_epu64(__m256i __X, __m256i __Y,
                                  __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_mask_madd52lo_epu64(__m256i a, __mmask8 k,
                                       __m256i b, __m256i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_maskz_madd52lo_epu64(__mmask8 k, __m256i a,
                                        __m256i b, __m256i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_madd52hi_epu64(__m256i __X, __m256i __Y,
                                  __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i b, 
    __m256i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_mask_madd52hi_epu64(__m256i a, __mmask8 k,
                                       __m256i b, __m256i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    __m256i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m256i _mm256_maskz_madd52hi_epu64(__mmask8 k, __m256i a,
                                        __m256i b, __m256i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dpbf16_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __m256bh a, 
    __m256bh b
:Param ETypes:
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m256 _mm256_dpbf16_ps(__m256 src, __m256bh a, __m256bh b);

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 7
        	dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        	dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dpbf16_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256bh a, 
    __m256bh b
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m256 _mm256_mask_dpbf16_ps(__m256 src, __mmask8 k,
                                 __m256bh a, __m256bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dpbf16_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 src, 
    __m256bh a, 
    __m256bh b
:Param ETypes:
    MASK k, 
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m256 _mm256_maskz_dpbf16_ps(__mmask8 k, __m256 src,
                                  __m256bh a, __m256bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_add_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := a.fp16[j] + b.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_add_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_add_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_add_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_add_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_div_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := a.fp16[j] / b.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_div_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_div_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_div_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_div_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmadd_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fmadd_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmadd_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fmadd_ph(__m256h a, __mmask16 k,
                                 __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fmadd_ph(__m256h a, __m256h b,
                                  __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fmadd_ph(__mmask16 k, __m256h a,
                                  __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fnmadd_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fnmadd_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fnmadd_ph(__m256h a, __mmask16 k,
                                  __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fnmadd_ph(__m256h a, __m256h b,
                                   __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmadd_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fnmadd_ph(__mmask16 k, __m256h a,
                                   __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsub_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fmsub_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsub_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fmsub_ph(__m256h a, __mmask16 k,
                                 __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fmsub_ph(__m256h a, __m256h b,
                                  __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fmsub_ph(__mmask16 k, __m256h a,
                                  __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fnmsub_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fnmsub_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fnmsub_ph(__m256h a, __mmask16 k,
                                  __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fnmsub_ph(__m256h a, __m256h b,
                                   __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fnmsub_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fnmsub_ph(__mmask16 k, __m256h a,
                                   __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmaddsub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fmaddsub_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fmaddsub_ph(__m256h a, __mmask16 k,
                                    __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fmaddsub_ph(__m256h a, __m256h b,
                                     __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fmaddsub_ph(__mmask16 k, __m256h a,
                                     __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsubadd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fmsubadd_ph(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask16 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fmsubadd_ph(__m256h a, __mmask16 k,
                                    __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask16 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fmsubadd_ph(__m256h a, __m256h b,
                                     __m256h c, __mmask16 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fmsubadd_ph(__mmask16 k, __m256h a,
                                     __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_sub_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := a.fp16[j] - b.fp16[j]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_sub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_sub_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_sub_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_sub_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mul_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mul_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 15
        	dst.fp16[i] := a.fp16[i] * b.fp16[i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_mul_ph(__m256h src, __mmask16 k,
                               __m256h a, __m256h b)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 15
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_mul_ph(__mmask16 k, __m256h a,
                                __m256h b)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 15
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmul_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_fmul_pch(__m256h a, __m256h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mul_pch
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mul_pch(__m256h a, __m256h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_fmul_pch(__m256h src, __mmask8 k,
                                 __m256h a, __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_mul_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_mul_pch(__m256h src, __mmask8 k,
                                __m256h a, __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_fmul_pch(__mmask8 k, __m256h a,
                                  __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_mul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_mul_pch(__mmask8 k, __m256h a,
                                 __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fcmul_pch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_fcmul_pch(__m256h a, __m256h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmul_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_cmul_pch(__m256h a, __m256h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fcmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_fcmul_pch(__m256h src, __mmask8 k,
                                  __m256h a, __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cmul_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_cmul_pch(__m256h src, __mmask8 k,
                                 __m256h a, __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fcmul_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_fcmul_pch(__mmask8 k, __m256h a,
                                   __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cmul_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_cmul_pch(__mmask8 k, __m256h a,
                                  __m256h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmadd_pch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fmadd_pch(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask8 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fmadd_pch(__m256h a, __mmask8 k,
                                  __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fmadd_pch(__m256h a, __m256h b,
                                   __m256h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fmadd_pch(__mmask8 k, __m256h a,
                                   __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fcmadd_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_fcmadd_pch(__m256h a, __m256h b, __m256h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __mmask8 k, 
    __m256h b, 
    __m256h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_mask_fcmadd_pch(__m256h a, __mmask8 k,
                                   __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask3_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b, 
    __m256h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m256h _mm256_mask3_fcmadd_pch(__m256h a, __m256h b,
                                    __m256h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a, 
    __m256h b, 
    __m256h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m256h _mm256_maskz_fcmadd_pch(__mmask8 k, __m256h a,
                                    __m256h b, __m256h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_reduce_add_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: _Float16
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm256_reduce_add_ph(__m256h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 7
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+8]
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
        	

_mm256_reduce_mul_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: _Float16
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm256_reduce_mul_ph(__m256h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (316-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 7
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+8]
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
        	

_mm256_reduce_max_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: _Float16
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm256_reduce_max_ph(__m256h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 7
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm256_reduce_min_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: _Float16
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm256_reduce_min_ph(__m256h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 7
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
        ENDFOR
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm256_abs_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h v2
:Param ETypes:
    FP16 v2

.. code-block:: C

    __m256h _mm256_abs_ph(__m256h v2);

.. admonition:: Intel Description

    Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := ABS(v2.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_conj_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_conj_pch(__m256h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_conj_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask8 k, 
    __m256h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_mask_conj_pch(__m256h src, __mmask8 k,
                                 __m256h a)

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_conj_pch
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask8 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_maskz_conj_pch(__mmask8 k, __m256h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_dpwssds_epi32(__mmask8 k, __m256i src,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_dpwssds_epi32(__m256i src, __mmask8 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssds_epi32(__m256i src, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_dpwssd_epi32(__mmask8 k, __m256i src,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_dpwssd_epi32(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssd_epi32(__m256i src, __m256i a,
                                __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_dpbusds_epi32(__mmask8 k, __m256i src,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_dpbusds_epi32(__m256i src, __mmask8 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusds_epi32(__m256i src, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maskz_dpbusd_epi32(__mmask8 k, __m256i src,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_mask_dpbusd_epi32(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusd_epi32(__m256i src, __m256i a,
                                __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_abs_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_mask_abs_epi8(__m128i src, __mmask16 k,
                              __m128i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_abs_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_maskz_abs_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := ABS(a[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_abs_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_mask_abs_epi16(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_abs_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_maskz_abs_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := ABS(a[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_add_epi8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_add_epi8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] + b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_adds_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_adds_epi8(__m128i src, __mmask16 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_adds_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_adds_epi8(__mmask16 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_adds_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_adds_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_adds_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_adds_epi16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_adds_epu8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_adds_epu8(__m128i src, __mmask16 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_adds_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_adds_epu8(__mmask16 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_adds_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_adds_epu16(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_adds_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_adds_epu16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_add_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_add_epi16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] + b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_avg_epu8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_avg_epu8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_avg_epu8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_avg_epu8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_avg_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_avg_epu16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_avg_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_avg_epu16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_maddubs_epi16(__m128i src, __mmask8 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_maddubs_epi16(__mmask8 k, __m128i a,
                                    __m128i b)

.. admonition:: Intel Description

    Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_madd_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_madd_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_madd_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_madd_epi16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_max_epi8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_max_epi8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_max_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_max_epi16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epu8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_max_epu8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epu8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_max_epu8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_max_epu16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_max_epu16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_min_epi8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_min_epi8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_min_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_min_epi16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epu8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_min_epu8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epu8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_min_epu8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_min_epu16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_min_epu16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_mulhrs_epi16(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_mulhrs_epi16(__mmask8 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        		dst[i+15:i] := tmp[16:1]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_mulhi_epu16(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mulhi_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_mulhi_epu16(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := a[i+15:i] * b[i+15:i]
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_mulhi_epi16(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mulhi_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_mulhi_epi16(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[31:16]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mullo_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_mullo_epi16(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mullo_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_mullo_epi16(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        		dst[i+15:i] := tmp[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_sub_epi8(__m128i src, __mmask16 k,
                              __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_sub_epi8(__mmask16 k, __m128i a,
                               __m128i b)

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[i+7:i] - b[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_subs_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_subs_epi8(__m128i src, __mmask16 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_subs_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_subs_epi8(__mmask16 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_subs_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_subs_epi16(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_subs_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_subs_epi16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_subs_epu8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_subs_epu8(__m128i src, __mmask16 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_subs_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_subs_epu8(__mmask16 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        	ELSE
        		dst[i+7:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_subs_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_subs_epu16(__m128i src, __mmask8 k,
                                __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_subs_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_subs_epu16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_sub_epi16(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_sub_epi16(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[i+15:i] - b[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_reduce_add_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_add_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[15:0] + src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] + src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_ADD(a, 8)
        	

_mm_mask_reduce_add_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_add_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[15:0] + src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] + src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_ADD(tmp, 8)
        	

_mm_reduce_add_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_add_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[7:0] + src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] + src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_ADD(a, 16)
        	

_mm_mask_reduce_add_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_add_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_ADD(src, len) {
        	IF len == 2
        		RETURN src[7:0] + src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] + src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_ADD(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_ADD(tmp, 16)
        	

_mm_reduce_mul_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_mul_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[15:0] * src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] * src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_MUL(a, 8)
        	

_mm_mask_reduce_mul_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_mul_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[15:0] * src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] * src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 1
        	FI
        ENDFOR
        dst[15:0] := REDUCE_MUL(tmp, 8)
        	

_mm_reduce_mul_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_mul_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[7:0] * src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] * src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_MUL(a, 16)
        	

_mm_mask_reduce_mul_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_mul_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_MUL(src, len) {
        	IF len == 2
        		RETURN src[7:0] * src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] * src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_MUL(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 1
        	FI
        ENDFOR
        dst[7:0] := REDUCE_MUL(tmp, 16)
        	

_mm_reduce_or_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_or_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[15:0] OR src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] OR src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_OR(a, 8)
        	

_mm_mask_reduce_or_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_or_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[15:0] OR src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] OR src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0
        	FI
        ENDFOR
        dst[15:0] := REDUCE_OR(tmp, 8)
        	

_mm_reduce_or_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_or_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[7:0] OR src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] OR src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_OR(a, 16)
        	

_mm_mask_reduce_or_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_or_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_OR(src, len) {
        	IF len == 2
        		RETURN src[7:0] OR src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] OR src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_OR(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0
        	FI
        ENDFOR
        dst[7:0] := REDUCE_OR(tmp, 16)
        	

_mm_reduce_and_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    short _mm_reduce_and_epi16(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[15:0] AND src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] AND src[i+16*len+31:i+16*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[16*len-1:0], len)
        }
        dst[15:0] := REDUCE_AND(a, 8)
        	

_mm_mask_reduce_and_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    short _mm_mask_reduce_and_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[15:0] AND src[31:16]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*16
        		src[i+15:i] := src[i+15:i] AND src[i+16*len+15:i+16*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[16*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		tmp[i+15:i] := a[i+15:i]
        	ELSE
        		tmp[i+15:i] := 0xFFFF
        	FI
        ENDFOR
        dst[15:0] := REDUCE_AND(tmp, 8)
        	

_mm_reduce_and_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    char _mm_reduce_and_epi8(__m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[7:0] AND src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] AND src[i+8*len+15:i+8*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[8*len-1:0], len)
        }
        dst[7:0] := REDUCE_AND(a, 16)
        	

_mm_mask_reduce_and_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: char
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    char _mm_mask_reduce_and_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE REDUCE_AND(src, len) {
        	IF len == 2
        		RETURN src[7:0] AND src[15:8]
        	FI
        	len := len / 2
        	FOR j:= 0 to (len-1)
        		i := j*8
        		src[i+7:i] := src[i+7:i] AND src[i+8*len+7:i+8*len]
        	ENDFOR
        	RETURN REDUCE_AND(src[8*len-1:0], len)
        }
        tmp := a
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		tmp[i+7:i] := a[i+7:i]
        	ELSE
        		tmp[i+7:i] := 0xFF
        	FI
        ENDFOR
        dst[7:0] := REDUCE_AND(tmp, 16)
        	

_mm_mask_mullo_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_mullo_epi64(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mullo_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_mullo_epi64(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := a[i+63:i] * b[i+63:i]
        		dst[i+63:i] := tmp[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mullo_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mullo_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := a[i+63:i] * b[i+63:i]
        	dst[i+63:i] := tmp[63:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_add_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_add_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_add_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_add_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_div_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_div_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_div_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_div_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] / b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_div_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_div_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_div_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_div_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] / b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmadd_pd(__m128d a, __m128d b, __m128d c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmadd_pd(__m128d a, __mmask8 k, __m128d b,
                              __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmadd_pd(__mmask8 k, __m128d a, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmadd_ps(__m128 a, __m128 b, __m128 c,
                              __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmadd_ps(__m128 a, __mmask8 k, __m128 b,
                             __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmadd_ps(__mmask8 k, __m128 a, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmaddsub_pd(__m128d a, __m128d b,
                                  __m128d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmaddsub_pd(__m128d a, __mmask8 k,
                                 __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmaddsub_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmaddsub_pd(__mmask8 k, __m128d a,
                                  __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmaddsub_ps(__m128 a, __m128 b, __m128 c,
                                 __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmaddsub_ps(__m128 a, __mmask8 k, __m128 b,
                                __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmaddsub_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmaddsub_ps(__mmask8 k, __m128 a, __m128 b,
                                 __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmsub_pd(__m128d a, __m128d b, __m128d c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmsub_pd(__m128d a, __mmask8 k, __m128d b,
                              __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmsub_pd(__mmask8 k, __m128d a, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmsub_ps(__m128 a, __m128 b, __m128 c,
                              __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmsub_ps(__m128 a, __mmask8 k, __m128 b,
                             __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmsub_ps(__mmask8 k, __m128 a, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmsubadd_pd(__m128d a, __m128d b,
                                  __m128d c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmsubadd_pd(__m128d a, __mmask8 k,
                                 __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1 
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsubadd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmsubadd_pd(__mmask8 k, __m128d a,
                                  __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        		ELSE
        			dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        		FI
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmsubadd_ps(__m128 a, __m128 b, __m128 c,
                                 __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmsubadd_ps(__m128 a, __mmask8 k, __m128 b,
                                __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsubadd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmsubadd_ps(__mmask8 k, __m128 a, __m128 b,
                                 __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		IF ((j & 1) == 0) 
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        		ELSE
        			dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        		FI
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fnmadd_pd(__m128d a, __m128d b, __m128d c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fnmadd_pd(__m128d a, __mmask8 k, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fnmadd_pd(__mmask8 k, __m128d a,
                                __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fnmadd_ps(__m128 a, __m128 b, __m128 c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fnmadd_ps(__m128 a, __mmask8 k, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fnmadd_ps(__mmask8 k, __m128 a, __m128 b,
                               __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fnmsub_pd(__m128d a, __m128d b, __m128d c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := c[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fnmsub_pd(__m128d a, __mmask8 k, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fnmsub_pd(__mmask8 k, __m128d a,
                                __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fnmsub_ps(__m128 a, __m128 b, __m128 c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := c[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fnmsub_ps(__m128 a, __mmask8 k, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fnmsub_ps(__mmask8 k, __m128 a, __m128 b,
                               __m128 c)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_mask_max_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_max_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_max_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_max_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_max_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_min_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_min_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_min_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_min_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_mul_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_mul_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] * b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_mul_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_mul_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_abs_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_mask_abs_epi32(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_abs_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_maskz_abs_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ABS(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_abs_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm_abs_epi64(__m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ABS(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_abs_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_mask_abs_epi64(__m128i src, __mmask8 k,
                               __m128i a)

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_abs_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_maskz_abs_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ABS(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_add_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_add_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_add_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_add_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mask_max_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_maskz_max_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_mask_max_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_maskz_max_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_max_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_max_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_max_epu32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_max_epu32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_max_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_max_epu64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_max_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_max_epu64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_max_epu64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_max_epu64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mask_min_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_maskz_min_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_mask_min_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_maskz_min_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_min_epi64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m128i _mm_min_epi64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_min_epu32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_min_epu32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_min_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_min_epu64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_min_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_min_epu64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_min_epu64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_min_epu64(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mask_mul_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_maskz_mul_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mullo_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_mullo_epi32(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mullo_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_mullo_epi32(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		tmp[63:0] := a[i+31:i] * b[i+31:i]
        		dst[i+31:i] := tmp[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_mul_epu32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_mul_epu32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+31:i] * b[i+31:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_sub_epi32(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_sub_epi32(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_sub_epi64(__m128i src, __mmask8 k,
                               __m128i a, __m128i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_sub_epi64(__mmask8 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rcp14_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_rcp14_pd(__m128d src, __mmask8 k,
                              __m128d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rcp14_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_rcp14_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rcp14_pd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_rcp14_pd(__m128d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (1.0 / a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rcp14_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_rcp14_ps(__m128 src, __mmask8 k, __m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rcp14_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_rcp14_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rcp14_ps
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rcp14_ps(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (1.0 / a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rsqrt14_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_rsqrt14_pd(__m128d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_rsqrt14_pd(__m128d src, __mmask8 k,
                                __m128d a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt14_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_rsqrt14_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_rsqrt14_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_rsqrt14_ps(__m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_rsqrt14_ps(__m128 src, __mmask8 k,
                               __m128 a)

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_rsqrt14_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_rsqrt14_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_sub_pd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_sub_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_sub_ps(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_sub_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_add_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_add_round_sd(__m128d a, __m128d b,
                             int rounding)

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] + b[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_add_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_add_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int rounding)

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] + b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_add_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_add_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] + b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_add_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_add_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int rounding)

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] + b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_add_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_add_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] + b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_add_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_add_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] + b[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_add_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_add_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b,
                                 int rounding)

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] + b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_add_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_add_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] + b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_add_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_add_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int rounding)

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] + b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_add_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_add_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] + b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_div_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_div_round_sd(__m128d a, __m128d b,
                             int rounding)

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] / b[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_div_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_div_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int rounding)

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". 
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] / b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_div_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_div_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] / b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_div_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_div_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int rounding)

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] / b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_div_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_div_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] / b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_div_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_div_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] / b[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_div_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_div_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b,
                                 int rounding)

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] / b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_div_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_div_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] / b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_div_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_div_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int rounding)

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] / b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_div_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_div_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] / b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmadd_round_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_fmadd_round_sd(__m128d a, __m128d b, __m128d c,
                               int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask3_fmadd_round_sd(__m128d a, __m128d b,
                                     __m128d c, __mmask8 k,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmadd_sd(__m128d a, __m128d b, __m128d c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_fmadd_round_sd(__m128d a, __mmask8 k,
                                    __m128d b, __m128d c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmadd_sd(__m128d a, __mmask8 k, __m128d b,
                              __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_fmadd_round_sd(__mmask8 k, __m128d a,
                                     __m128d b, __m128d c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmadd_sd(__mmask8 k, __m128d a, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask3_fmadd_round_ss(__m128 a, __m128 b,
                                    __m128 c, __mmask8 k,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmadd_ss(__m128 a, __m128 b, __m128 c,
                              __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_fmadd_round_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_fmadd_round_ss(__m128 a, __m128 b, __m128 c,
                              int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_fmadd_round_ss(__m128 a, __mmask8 k,
                                   __m128 b, __m128 c,
                                   int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmadd_ss(__m128 a, __mmask8 k, __m128 b,
                             __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_fmadd_round_ss(__mmask8 k, __m128 a,
                                    __m128 b, __m128 c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmadd_ss(__mmask8 k, __m128 a, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmsub_round_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_fmsub_round_sd(__m128d a, __m128d b, __m128d c,
                               int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask3_fmsub_round_sd(__m128d a, __m128d b,
                                     __m128d c, __mmask8 k,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fmsub_sd(__m128d a, __m128d b, __m128d c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_fmsub_round_sd(__m128d a, __mmask8 k,
                                    __m128d b, __m128d c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fmsub_sd(__m128d a, __mmask8 k, __m128d b,
                              __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_fmsub_round_sd(__mmask8 k, __m128d a,
                                     __m128d b, __m128d c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fmsub_sd(__mmask8 k, __m128d a, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fmsub_round_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_fmsub_round_ss(__m128 a, __m128 b, __m128 c,
                              int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask3_fmsub_round_ss(__m128 a, __m128 b,
                                    __m128 c, __mmask8 k,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fmsub_ss(__m128 a, __m128 b, __m128 c,
                              __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_fmsub_round_ss(__m128 a, __mmask8 k,
                                   __m128 b, __m128 c,
                                   int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fmsub_ss(__m128 a, __mmask8 k, __m128 b,
                             __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_fmsub_round_ss(__mmask8 k, __m128 a,
                                    __m128 b, __m128 c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fmsub_ss(__mmask8 k, __m128 a, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fnmadd_round_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_fnmadd_round_sd(__m128d a, __m128d b, __m128d c,
                                int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask3_fnmadd_round_sd(__m128d a, __m128d b,
                                      __m128d c, __mmask8 k,
                                      int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fnmadd_sd(__m128d a, __m128d b, __m128d c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_fnmadd_round_sd(__m128d a, __mmask8 k,
                                     __m128d b, __m128d c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fnmadd_sd(__m128d a, __mmask8 k, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_fnmadd_round_sd(__mmask8 k, __m128d a,
                                      __m128d b, __m128d c,
                                      int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fnmadd_sd(__mmask8 k, __m128d a,
                                __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fnmadd_round_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_fnmadd_round_ss(__m128 a, __m128 b, __m128 c,
                               int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask3_fnmadd_round_ss(__m128 a, __m128 b,
                                     __m128 c, __mmask8 k,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fnmadd_ss(__m128 a, __m128 b, __m128 c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_fnmadd_round_ss(__m128 a, __mmask8 k,
                                    __m128 b, __m128 c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fnmadd_ss(__m128 a, __mmask8 k, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_fnmadd_round_ss(__mmask8 k, __m128 a,
                                     __m128 b, __m128 c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fnmadd_ss(__mmask8 k, __m128 a, __m128 b,
                               __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fnmsub_round_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_fnmsub_round_sd(__m128d a, __m128d b, __m128d c,
                                int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask3_fnmsub_round_sd(__m128d a, __m128d b,
                                      __m128d c, __mmask8 k,
                                      int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c, 
    __mmask8 k
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c, 
    MASK k

.. code-block:: C

    __m128d _mm_mask3_fnmsub_sd(__m128d a, __m128d b, __m128d c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := c[63:0]
        FI
        dst[127:64] := c[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_fnmsub_round_sd(__m128d a, __mmask8 k,
                                     __m128d b, __m128d c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_mask_fnmsub_sd(__m128d a, __mmask8 k, __m128d b,
                               __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_fnmsub_round_sd(__mmask8 k, __m128d a,
                                      __m128d b, __m128d c,
                                      int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_maskz_fnmsub_sd(__mmask8 k, __m128d a,
                                __m128d b, __m128d c)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fnmsub_round_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_fnmsub_round_ss(__m128 a, __m128 b, __m128 c,
                               int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask3_fnmsub_round_ss(__m128 a, __m128 b,
                                     __m128 c, __mmask8 k,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c, 
    __mmask8 k
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c, 
    MASK k

.. code-block:: C

    __m128 _mm_mask3_fnmsub_ss(__m128 a, __m128 b, __m128 c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := c[31:0]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_fnmsub_round_ss(__m128 a, __mmask8 k,
                                    __m128 b, __m128 c,
                                    int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_mask_fnmsub_ss(__m128 a, __mmask8 k, __m128 b,
                              __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_fnmsub_round_ss(__mmask8 k, __m128 a,
                                     __m128 b, __m128 c,
                                     int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_maskz_fnmsub_ss(__mmask8 k, __m128 a, __m128 b,
                               __m128 c)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_mul_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_mul_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] * b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_mul_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_mul_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] * b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_mul_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] * b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_mul_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] * b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mul_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mul_round_sd(__m128d a, __m128d b,
                             int rounding)

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] * b[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_mul_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_mul_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b,
                                 int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] * b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_mul_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_mul_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] * b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_mul_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int rounding)

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] * b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_mul_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] * b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mul_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mul_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] * b[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_sub_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_sub_round_sd(__m128d src, __mmask8 k,
                                  __m128d a, __m128d b,
                                  int rounding)

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] - b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_sub_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_sub_sd(__m128d src, __mmask8 k, __m128d a,
                            __m128d b)

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] - b[63:0]
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_round_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_sub_round_sd(__mmask8 k, __m128d a,
                                   __m128d b, int rounding)

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] - b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_sub_sd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := a[63:0] - b[63:0]
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_sub_round_sd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_sub_round_sd(__m128d a, __m128d b,
                             int rounding)

.. admonition:: Intel Description

    Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] - b[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_sub_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_sub_round_ss(__m128 src, __mmask8 k,
                                 __m128 a, __m128 b,
                                 int rounding)

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] - b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_sub_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_sub_ss(__m128 src, __mmask8 k, __m128 a,
                           __m128 b)

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] - b[31:0]
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_round_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_sub_round_ss(__mmask8 k, __m128 a,
                                  __m128 b, int rounding)

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] - b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_sub_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := a[31:0] - b[31:0]
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_sub_round_ss
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_sub_round_ss(__m128 a, __m128 b, int rounding);

.. admonition:: Intel Description

    Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - b[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_madd52lo_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_madd52lo_epu64(__m128i __X, __m128i __Y,
                               __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_mask_madd52lo_epu64(__m128i a, __mmask8 k,
                                    __m128i b, __m128i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_maskz_madd52lo_epu64(__mmask8 k, __m128i a,
                                     __m128i b, __m128i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_madd52hi_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_madd52hi_epu64(__m128i __X, __m128i __Y,
                               __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        	dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i b, 
    __m128i c
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_mask_madd52hi_epu64(__m128i a, __mmask8 k,
                                    __m128i b, __m128i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    __m128i c
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    UI64 c

.. code-block:: C

    __m128i _mm_maskz_madd52hi_epu64(__mmask8 k, __m128i a,
                                     __m128i b, __m128i c)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
        		dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_dpbf16_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __m128bh a, 
    __m128bh b
:Param ETypes:
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m128 _mm_dpbf16_ps(__m128 src, __m128bh a, __m128bh b);

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 3
        	dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        	dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dpbf16_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128bh a, 
    __m128bh b
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m128 _mm_mask_dpbf16_ps(__m128 src, __mmask8 k,
                              __m128bh a, __m128bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_dpbf16_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 src, 
    __m128bh a, 
    __m128bh b
:Param ETypes:
    MASK k, 
    FP32 src, 
    BF16 a, 
    BF16 b

.. code-block:: C

    __m128 _mm_maskz_dpbf16_ps(__mmask8 k, __m128 src,
                               __m128bh a, __m128bh b)

.. admonition:: Intel Description

    Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE make_fp32(x[15:0]) {
        	y.fp32  := 0.0
        	y[31:16] := x[15:0]
        	RETURN y
        }
        dst := src
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
        		dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_add_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_add_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := a.fp16[j] + b.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_add_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_add_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_add_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_add_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] + b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_div_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_div_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := a.fp16[j] / b.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_div_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_div_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_div_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_div_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] / b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmadd_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmadd_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmadd_ph(__m128h a, __mmask8 k, __m128h b,
                              __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmadd_ph(__m128h a, __m128h b, __m128h c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmadd_ph(__mmask8 k, __m128h a, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fnmadd_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fnmadd_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fnmadd_ph(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fnmadd_ph(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fnmadd_ph(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsub_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmsub_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmsub_ph(__m128h a, __mmask8 k, __m128h b,
                              __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmsub_ph(__m128h a, __m128h b, __m128h c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmsub_ph(__mmask8 k, __m128h a, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fnmsub_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fnmsub_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fnmsub_ph(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fnmsub_ph(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fnmsub_ph(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmaddsub_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmaddsub_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmaddsub_ph(__m128h a, __mmask8 k,
                                 __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmaddsub_ph(__m128h a, __m128h b,
                                  __m128h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmaddsub_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmaddsub_ph(__mmask8 k, __m128h a,
                                  __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsubadd_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmsubadd_ph(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF ((j & 1) == 0)
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        	ELSE
        		dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmsubadd_ph(__m128h a, __mmask8 k,
                                 __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmsubadd_ph(__m128h a, __m128h b,
                                  __m128h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := c.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmsubadd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmsubadd_ph(__mmask8 k, __m128h a,
                                  __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF ((j & 1) == 0)
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
        		ELSE
        			dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
        		FI
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sub_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_sub_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := a.fp16[j] - b.fp16[j]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_sub_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_sub_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_sub_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_sub_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := a.fp16[j] - b.fp16[j]
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mul_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mul_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 7
        	dst.fp16[i] := a.fp16[i] * b.fp16[i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_mul_ph(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 7
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_mul_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 TO 7
        	IF k[i]
        		dst.fp16[i] := a.fp16[i] * b.fp16[i]
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmul_pch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_fmul_pch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mul_pch
^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mul_pch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmul_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_fmul_pch(__m128h src, __mmask8 k,
                              __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_mul_pch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_mul_pch(__m128h src, __mmask8 k, __m128h a,
                             __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmul_pch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_fmul_pch(__mmask8 k, __m128h a,
                               __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_mul_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_mul_pch(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fcmul_pch
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_fcmul_pch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cmul_pch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_cmul_pch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fcmul_pch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_fcmul_pch(__m128h src, __mmask8 k,
                               __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cmul_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_cmul_pch(__m128h src, __mmask8 k,
                              __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := src.fp16[2*i+0]
        		dst.fp16[2*i+1] := src.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fcmul_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_fcmul_pch(__mmask8 k, __m128h a,
                                __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cmul_pch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_cmul_pch(__mmask8 k, __m128h a,
                               __m128h b)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmadd_pch
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmadd_pch(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_pch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmadd_pch(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmadd_pch(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmadd_pch(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fcmadd_pch
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fcmadd_pch(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        	dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fcmadd_pch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fcmadd_pch(__m128h a, __mmask8 k,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := a.fp16[2*i+0]
        		dst.fp16[2*i+1] := a.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask3_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fcmadd_pch(__m128h a, __m128h b,
                                 __m128h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := c.fp16[2*i+0]
        		dst.fp16[2*i+1] := c.fp16[2*i+1]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fcmadd_pch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fcmadd_pch(__mmask8 k, __m128h a,
                                 __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	IF k[i]
        		dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
        		dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
        	ELSE
        		dst.fp16[2*i+0] := 0
        		dst.fp16[2*i+1] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_reduce_add_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: _Float16
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm_reduce_add_ph(__m128h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
        	

_mm_reduce_mul_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: _Float16
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm_reduce_mul_ph(__m128h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 3
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
        ENDFOR
        dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
        	

_mm_reduce_max_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: _Float16
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm_reduce_max_ph(__m128h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm_reduce_min_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: _Float16
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm_reduce_min_ph(__m128h a);

.. admonition:: Intel Description

    Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := a
        FOR i := 0 to 3
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
        ENDFOR
        FOR i := 0 to 1
        	tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
        ENDFOR
        dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
        	

_mm_abs_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h v2
:Param ETypes:
    FP16 v2

.. code-block:: C

    __m128h _mm_abs_ph(__m128h v2);

.. admonition:: Intel Description

    Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := ABS(v2.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_conj_pch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_conj_pch(__m128h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_conj_pch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_mask_conj_pch(__m128h src, __mmask8 k,
                              __m128h a)

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_conj_pch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_maskz_conj_pch(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_add_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_add_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] + b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_add_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_add_round_sh(__m128h a, __m128h b,
                             int rounding)

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] + b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_add_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_add_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] + b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_add_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_add_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int rounding)

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] + b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_add_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_add_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] + b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_add_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_add_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int rounding)

.. admonition:: Intel Description

    Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] + b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_div_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_div_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] / b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_div_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_div_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] / b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_div_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_div_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] / b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_div_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_div_round_sh(__m128h a, __m128h b,
                             int rounding)

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] / b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_div_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_div_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int rounding)

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] / b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_div_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_div_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int rounding)

.. admonition:: Intel Description

    Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] / b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fmadd_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmadd_sh(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmadd_sh(__m128h a, __mmask8 k, __m128h b,
                              __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmadd_sh(__m128h a, __m128h b, __m128h c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmadd_sh(__mmask8 k, __m128h a, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fmadd_round_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fmadd_round_sh(__m128h a, __m128h b, __m128h c,
                               const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fmadd_round_sh(__m128h a, __mmask8 k,
                                    __m128h b, __m128h c,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fmadd_round_sh(__m128h a, __m128h b,
                                     __m128h c, __mmask8 k,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fmadd_round_sh(__mmask8 k, __m128h a,
                                     __m128h b, __m128h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fnmadd_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fnmadd_sh(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fnmadd_sh(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fnmadd_sh(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fnmadd_sh(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fnmadd_round_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fnmadd_round_sh(__m128h a, __m128h b, __m128h c,
                                const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fnmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fnmadd_round_sh(__m128h a, __mmask8 k,
                                     __m128h b, __m128h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fnmadd_round_sh(__m128h a, __m128h b,
                                      __m128h c, __mmask8 k,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmadd_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fnmadd_round_sh(__mmask8 k, __m128h a,
                                      __m128h b, __m128h c,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fmsub_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmsub_sh(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmsub_sh(__m128h a, __mmask8 k, __m128h b,
                              __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmsub_sh(__m128h a, __m128h b, __m128h c,
                               __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmsub_sh(__mmask8 k, __m128h a, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fmsub_round_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fmsub_round_sh(__m128h a, __m128h b, __m128h c,
                               const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fmsub_round_sh(__m128h a, __mmask8 k,
                                    __m128h b, __m128h c,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fmsub_round_sh(__m128h a, __m128h b,
                                     __m128h c, __mmask8 k,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fmsub_round_sh(__mmask8 k, __m128h a,
                                     __m128h b, __m128h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fnmsub_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fnmsub_sh(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fnmsub_sh(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fnmsub_sh(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fnmsub_sh(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fnmsub_round_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fnmsub_round_sh(__m128h a, __m128h b, __m128h c,
                                const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_fnmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fnmsub_round_sh(__m128h a, __mmask8 k,
                                     __m128h b, __m128h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask3_fnmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fnmsub_round_sh(__m128h a, __m128h b,
                                      __m128h c, __mmask8 k,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        FI
        dst[127:16] := c[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_fnmsub_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fnmsub_round_sh(__mmask8 k, __m128h a,
                                      __m128h b, __m128h c,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_sub_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_sub_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] - b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_sub_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_sub_round_sh(__m128h a, __m128h b,
                             int rounding)

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] - b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_sub_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_sub_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] - b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_sub_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_sub_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int rounding)

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] - b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_sub_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] - b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_sub_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_sub_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int rounding)

.. admonition:: Intel Description

    Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] - b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mul_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mul_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] * b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mul_round_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mul_round_sh(__m128h a, __m128h b,
                             int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0] * b.fp16[0]
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_mul_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_mul_sh(__m128h src, __mmask8 k, __m128h a,
                            __m128h b)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] * b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_mul_round_sh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_mul_round_sh(__m128h src, __mmask8 k,
                                  __m128h a, __m128h b,
                                  int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] * b.fp16[0]
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_sh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_mul_sh(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] * b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_round_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_mul_round_sh(__mmask8 k, __m128h a,
                                   __m128h b, int rounding)

.. admonition:: Intel Description

    Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := a.fp16[0] * b.fp16[0]
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fmul_sch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_fmul_sch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mul_sch
^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mul_sch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmul_sch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_fmul_sch(__m128h src, __mmask8 k,
                              __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_mul_sch
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_mul_sch(__m128h src, __mmask8 k, __m128h a,
                             __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmul_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_fmul_sch(__mmask8 k, __m128h a,
                               __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_sch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_mul_sch(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmul_round_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fmul_round_sch(__m128h a, __m128h b,
                               const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mul_round_sch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mul_round_sch(__m128h a, __m128h b,
                              const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fmul_round_sch(__m128h src, __mmask8 k,
                                    __m128h a, __m128h b,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_mul_round_sch
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_mul_round_sch(__m128h src, __mmask8 k,
                                   __m128h a, __m128h b,
                                   const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fmul_round_sch(__mmask8 k, __m128h a,
                                     __m128h b,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_mul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_mul_round_sch(__mmask8 k, __m128h a,
                                    __m128h b,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    			[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fcmul_sch
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_fcmul_sch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cmul_sch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_cmul_sch(__m128h a, __m128h b);

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fcmul_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_fcmul_sch(__m128h src, __mmask8 k,
                               __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cmul_sch
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_cmul_sch(__m128h src, __mmask8 k,
                              __m128h a, __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fcmul_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_fcmul_sch(__mmask8 k, __m128h a,
                                __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cmul_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_cmul_sch(__mmask8 k, __m128h a,
                               __m128h b)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fcmul_round_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fcmul_round_sch(__m128h a, __m128h b,
                                const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cmul_round_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cmul_round_sch(__m128h a, __m128h b,
                               const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fcmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fcmul_round_sch(__m128h src, __mmask8 k,
                                     __m128h a, __m128h b,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_cmul_round_sch(__m128h src, __mmask8 k,
                                    __m128h a, __m128h b,
                                    const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        	dst.fp16[1] := src.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fcmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fcmul_round_sch(__mmask8 k, __m128h a,
                                      __m128h b,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cmul_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_cmul_round_sch(__mmask8 k, __m128h a,
                                     __m128h b,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmadd_sch
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fmadd_sch(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_sch
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fmadd_sch(__m128h a, __mmask8 k, __m128h b,
                               __m128h c)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        	dst.fp16[1] := a.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fmadd_sch(__m128h a, __m128h b, __m128h c,
                                __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        	dst.fp16[1] := c.fp16[1]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fmadd_sch(__mmask8 k, __m128h a,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmadd_round_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fmadd_round_sch(__m128h a, __m128h b, __m128h c,
                                const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fmadd_round_sch(__m128h a, __mmask8 k,
                                     __m128h b, __m128h c,
                                     const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        	dst.fp16[1] := a.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fmadd_round_sch(__m128h a, __m128h b,
                                      __m128h c, __mmask8 k,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        	dst.fp16[1] := c.fp16[1]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fmadd_round_sch(__mmask8 k, __m128h a,
                                      __m128h b, __m128h c,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fcmadd_sch
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_fcmadd_sch(__m128h a, __m128h b, __m128h c);

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fcmadd_sch
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_mask_fcmadd_sch(__m128h a, __mmask8 k,
                                __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        	dst.fp16[1] := a.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fcmadd_sch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k

.. code-block:: C

    __m128h _mm_mask3_fcmadd_sch(__m128h a, __m128h b,
                                 __m128h c, __mmask8 k)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        	dst.fp16[1] := c.fp16[1]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fcmadd_sch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c

.. code-block:: C

    __m128h _mm_maskz_fcmadd_sch(__mmask8 k, __m128h a,
                                 __m128h b, __m128h c)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fcmadd_round_sch
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_fcmadd_round_sch(__m128h a, __m128h b,
                                 __m128h c, const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fcmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __mmask8 k, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    FP16 a, 
    MASK k, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_fcmadd_round_sch(__m128h a, __mmask8 k,
                                      __m128h b, __m128h c,
                                      const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := a.fp16[0]
        	dst.fp16[1] := a.fp16[1]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask3_fcmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    __m128h c, 
    __mmask8 k, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    FP16 c, 
    MASK k, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask3_fcmadd_round_sch(__m128h a, __m128h b,
                                       __m128h c, __mmask8 k,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := c.fp16[0]
        	dst.fp16[1] := c.fp16[1]
        FI
        dst[127:32] := c[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fcmadd_round_sch
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    __m128h c, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    FP16 c, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_fcmadd_round_sch(__mmask8 k, __m128h a,
                                       __m128h b, __m128h c,
                                       const int rounding)

.. admonition:: Intel Description

    Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
        	dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
        ELSE
        	dst.fp16[0] := 0
        	dst.fp16[1] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_dpwssds_epi32(__mmask8 k, __m128i src,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_dpwssds_epi32(__m128i src, __mmask8 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_dpwssds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssds_epi32(__m128i src, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_dpwssd_epi32(__mmask8 k, __m128i src,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_dpwssd_epi32(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        		tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_dpwssd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssd_epi32(__m128i src, __m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_dpbusds_epi32(__mmask8 k, __m128i src,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_dpbusds_epi32(__m128i src, __mmask8 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_dpbusds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_dpbusds_epi32(__m128i src, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_maskz_dpbusd_epi32(__mmask8 k, __m128i src,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    MASK k, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_mask_dpbusd_epi32(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        		tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        		tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        		tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        		dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_dpbusd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX-512-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_dpbusd_epi32(__m128i src, __m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        	

Compare
-------
ZMM
~~~
_mm512_cmp_epi8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask64 _mm512_cmp_epi8_mask(__m512i a, __m512i b,
                                   const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmpeq_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmpge_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmpgt_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmple_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmplt_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_cmpneq_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmp_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask64 _mm512_mask_cmp_epi8_mask(__mmask64 k1, __m512i a,
                                        __m512i b,
                                        const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpeq_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpge_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpgt_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmple_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmplt_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpneq_epi8_mask(__mmask64 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmp_epu8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask64 _mm512_cmp_epu8_mask(__m512i a, __m512i b,
                                   const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmpeq_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmpge_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmpgt_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmple_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmplt_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_cmpneq_epu8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmp_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask64 _mm512_mask_cmp_epu8_mask(__mmask64 k1, __m512i a,
                                        __m512i b,
                                        const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpeq_epu8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpge_epu8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpgt_epu8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmple_epu8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmplt_epu8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_cmpneq_epu8_mask(__mmask64 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_cmp_epu16_mask(__m512i a, __m512i b,
                                    const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmpeq_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmpge_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmpgt_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmple_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmplt_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_cmpneq_epu16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_mask_cmp_epu16_mask(__mmask32 k1,
                                         __m512i a, __m512i b,
                                         const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpeq_epu16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpge_epu16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpgt_epu16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmple_epu16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmplt_epu16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpneq_epu16_mask(__mmask32 k1,
                                            __m512i a,
                                            __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_cmp_epi16_mask(__m512i a, __m512i b,
                                    const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmpeq_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmpge_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmpgt_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmple_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmplt_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_cmpneq_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_mask_cmp_epi16_mask(__mmask32 k1,
                                         __m512i a, __m512i b,
                                         const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpeq_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpge_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpgt_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmple_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmplt_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask32 _mm512_mask_cmpneq_epi16_mask(__mmask32 k1,
                                            __m512i a,
                                            __m512i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_test_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_test_epi8_mask(__mmask64 k1,
                                         __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_test_epi8_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_test_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_test_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_test_epi16_mask(__mmask32 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_test_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_test_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_mask_testn_epi8_mask(__mmask64 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask64 _mm512_testn_epi8_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_mask_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_mask_testn_epi16_mask(__mmask32 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask32 _mm512_testn_epi16_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_conflict_epi32(__m512i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	FOR k := 0 to j-1
        		m := k*32
        		dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        	ENDFOR
        	dst[i+31:i+j] := 0
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_conflict_epi32(__m512i src, __mmask16 k,
                                       __m512i a)

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_conflict_epi32(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_conflict_epi64(__m512i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	FOR k := 0 to j-1
        		m := k*64
        		dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        	ENDFOR
        	dst[i+63:i+j] := 0
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_conflict_epi64(__m512i src, __mmask8 k,
                                       __m512i a)

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_conflict_epi64(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_cmplt_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmplt_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_cmp_epi64_mask(__m512i a, __m512i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmpeq_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_cmpge_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_cmpgt_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_cmple_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_cmplt_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_cmpneq_epi64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_mask_cmp_epi64_mask(__mmask8 k1, __m512i a,
                                        __m512i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpeq_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpge_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpgt_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmple_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmplt_epi64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpneq_epi64_mask(__mmask8 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_cmp_epu64_mask(__m512i a, __m512i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmpeq_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmpge_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmpgt_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmple_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmplt_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_cmpneq_epu64_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_mask_cmp_epu64_mask(__mmask8 k1, __m512i a,
                                        __m512i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpeq_epu64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpge_epu64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpgt_epu64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmple_epu64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmplt_epu64_mask(__mmask8 k1,
                                          __m512i a, __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpneq_epu64_mask(__mmask8 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmp_pd_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_cmp_pd_mask(__m512d a, __m512d b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmp_round_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm512_cmp_round_pd_mask(__m512d a, __m512d b,
                                      const int imm8,
                                      const int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpeq_pd_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpeq_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0
        ENDFOR	
        k[MAX:8] := 0
        	

_mm512_cmple_pd_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmple_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmplt_pd_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmplt_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpneq_pd_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpneq_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpnle_pd_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpnle_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpnlt_pd_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpnlt_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpord_pd_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpord_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0 
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmpunord_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_cmpunord_pd_mask(__m512d a, __m512d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0 
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmp_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_mask_cmp_pd_mask(__mmask8 k1, __m512d a,
                                     __m512d b, const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmp_round_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm512_mask_cmp_round_pd_mask(__mmask8 k1,
                                           __m512d a, __m512d b,
                                           const int imm8,
                                           const int sae)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpeq_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpeq_pd_mask(__mmask8 k1, __m512d a,
                                       __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR	
        k[MAX:8] := 0
        	

_mm512_mask_cmple_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmple_pd_mask(__mmask8 k1, __m512d a,
                                       __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmplt_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmplt_pd_mask(__mmask8 k1, __m512d a,
                                       __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpneq_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpneq_pd_mask(__mmask8 k1, __m512d a,
                                        __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpnle_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpnle_pd_mask(__mmask8 k1, __m512d a,
                                        __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpnlt_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpnlt_pd_mask(__mmask8 k1, __m512d a,
                                        __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpord_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpord_pd_mask(__mmask8 k1, __m512d a,
                                        __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_cmpunord_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __mmask8 _mm512_mask_cmpunord_pd_mask(__mmask8 k1,
                                          __m512d a, __m512d b)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_cmp_ps_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_cmp_ps_mask(__m512 a, __m512 b,
                                 const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmp_round_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask16 _mm512_cmp_round_ps_mask(__m512 a, __m512 b,
                                       const int imm8,
                                       const int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpeq_ps_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpeq_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0
        ENDFOR	
        k[MAX:16] := 0
        	

_mm512_cmple_ps_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmple_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmplt_ps_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmplt_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpneq_ps_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpneq_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpnle_ps_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpnle_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpnlt_ps_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpnlt_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpord_ps_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpord_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpunord_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_cmpunord_ps_mask(__m512 a, __m512 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmp_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_mask_cmp_ps_mask(__mmask16 k1, __m512 a,
                                      __m512 b, const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmp_round_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask16 _mm512_mask_cmp_round_ps_mask(__mmask16 k1,
                                            __m512 a, __m512 b,
                                            const int imm8,
                                            const int sae)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpeq_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpeq_ps_mask(__mmask16 k1, __m512 a,
                                        __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR		
        k[MAX:16] := 0
        	

_mm512_mask_cmple_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmple_ps_mask(__mmask16 k1, __m512 a,
                                        __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmplt_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmplt_ps_mask(__mmask16 k1, __m512 a,
                                        __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpneq_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpneq_ps_mask(__mmask16 k1, __m512 a,
                                         __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpnle_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpnle_ps_mask(__mmask16 k1, __m512 a,
                                         __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpnlt_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpnlt_ps_mask(__mmask16 k1, __m512 a,
                                         __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpord_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpord_ps_mask(__mmask16 k1, __m512 a,
                                         __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpunord_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpunord_ps_mask(__mmask16 k1,
                                           __m512 a, __m512 b)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_cmp_epi32_mask(__m512i a, __m512i b,
                                    _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpeq_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_cmpge_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_cmpgt_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_cmple_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpneq_epi32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_mask_cmp_epi32_mask(__mmask16 k1,
                                         __m512i a, __m512i b,
                                         _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpeq_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpge_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpgt_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmple_epi32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpneq_epi32_mask(__mmask16 k1,
                                            __m512i a,
                                            __m512i b)

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_cmp_epu32_mask(__m512i a, __m512i b,
                                    _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpeq_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpge_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpgt_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmple_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmplt_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_cmpneq_epu32_mask(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_mask_cmp_epu32_mask(__mmask16 k1,
                                         __m512i a, __m512i b,
                                         _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpeq_epu32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpge_epu32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpgt_epu32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmple_epu32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmplt_epu32_mask(__mmask16 k1,
                                           __m512i a,
                                           __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask16 _mm512_mask_cmpneq_epu32_mask(__mmask16 k1,
                                            __m512i a,
                                            __m512i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_cmp_ph_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512h a, 
    __m512h b, 
    const int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_cmp_ph_mask(__m512h a, __m512h b,
                                 const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 31
        	k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmp_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512h a, 
    __m512h b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_mask_cmp_ph_mask(__mmask32 k1, __m512h a,
                                      __m512h b,
                                      const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 31
        	IF k1[j]
        		k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_cmp_round_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512h a, 
    __m512h b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask32 _mm512_cmp_round_ph_mask(__m512h a, __m512h b,
                                       const int imm8,
                                       const int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 31
        	k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_cmp_round_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512h a, 
    __m512h b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask32 _mm512_mask_cmp_round_ph_mask(__mmask32 k1,
                                            __m512h a,
                                            __m512h b,
                                            const int imm8,
                                            const int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[3:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 31
        	IF k1[j]
        		k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

YMM
~~~
_mm256_cmp_epi8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm256_cmp_epi8_mask(__m256i a, __m256i b,
                                   const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmpeq_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmpge_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmpgt_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmple_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmplt_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_cmpneq_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmp_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm256_mask_cmp_epi8_mask(__mmask32 k1, __m256i a,
                                        __m256i b,
                                        const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpeq_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpge_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpgt_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmple_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmplt_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpneq_epi8_mask(__mmask32 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmp_epu8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm256_cmp_epu8_mask(__m256i a, __m256i b,
                                   const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmpeq_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmpge_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmpgt_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmple_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmplt_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_cmpneq_epu8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmp_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm256_mask_cmp_epu8_mask(__mmask32 k1, __m256i a,
                                        __m256i b,
                                        const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpeq_epu8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpge_epu8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpgt_epu8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmple_epu8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmplt_epu8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_cmpneq_epu8_mask(__mmask32 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_cmp_epu16_mask(__m256i a, __m256i b,
                                    const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmpeq_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmpge_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmpgt_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmple_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmplt_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_cmpneq_epu16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_mask_cmp_epu16_mask(__mmask16 k1,
                                         __m256i a, __m256i b,
                                         const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpeq_epu16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpge_epu16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpgt_epu16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmple_epu16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmplt_epu16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpneq_epu16_mask(__mmask16 k1,
                                            __m256i a,
                                            __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_cmp_epi16_mask(__m256i a, __m256i b,
                                    const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmpeq_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmpge_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmpgt_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmple_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmplt_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_cmpneq_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_mask_cmp_epi16_mask(__mmask16 k1,
                                         __m256i a, __m256i b,
                                         const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpeq_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpge_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpgt_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmple_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmplt_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask16 _mm256_mask_cmpneq_epi16_mask(__mmask16 k1,
                                            __m256i a,
                                            __m256i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_test_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_test_epi8_mask(__mmask32 k1,
                                         __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_test_epi8_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_test_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_test_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_test_epi16_mask(__mmask16 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_test_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_test_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_mask_testn_epi8_mask(__mmask32 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask32 _mm256_testn_epi8_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_mask_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_mask_testn_epi16_mask(__mmask16 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask16 _mm256_testn_epi16_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_conflict_epi32(__m256i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	FOR k := 0 to j-1
        		m := k*32
        		dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        	ENDFOR
        	dst[i+31:i+j] := 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_conflict_epi32(__m256i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_conflict_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_conflict_epi64(__m256i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	FOR k := 0 to j-1
        		m := k*64
        		dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        	ENDFOR
        	dst[i+63:i+j] := 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_conflict_epi64(__m256i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_conflict_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmp_pd_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_pd_mask(__m256d a, __m256d b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmp_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_pd_mask(__mmask8 k1, __m256d a,
                                     __m256d b, const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmp_ps_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_ps_mask(__m256 a, __m256 b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmp_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_ps_mask(__mmask8 k1, __m256 a,
                                     __m256 b, const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_epi32_mask(__m256i a, __m256i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmpeq_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmpge_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmpgt_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmple_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmplt_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_cmpneq_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_epi32_mask(__mmask8 k1, __m256i a,
                                        __m256i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpeq_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpge_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpgt_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmple_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmplt_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpneq_epi32_mask(__mmask8 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_epi64_mask(__m256i a, __m256i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmpeq_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmpge_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmpgt_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmple_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmplt_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_cmpneq_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_epi64_mask(__mmask8 k1, __m256i a,
                                        __m256i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpeq_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpge_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpgt_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmple_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmplt_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpneq_epi64_mask(__mmask8 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_epu32_mask(__m256i a, __m256i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmpeq_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmpge_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmpgt_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmple_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmplt_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_cmpneq_epu32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_epu32_mask(__mmask8 k1, __m256i a,
                                        __m256i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpeq_epu32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpge_epu32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpgt_epu32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmple_epu32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmplt_epu32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpneq_epu32_mask(__mmask8 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_cmp_epu64_mask(__m256i a, __m256i b,
                                   _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmpeq_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmpge_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmpgt_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmple_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmplt_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_cmpneq_epu64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_cmp_epu64_mask(__mmask8 k1, __m256i a,
                                        __m256i b,
                                        _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpeq_epu64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpge_epu64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpgt_epu64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmple_epu64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmplt_epu64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_cmpneq_epu64_mask(__mmask8 k1,
                                           __m256i a,
                                           __m256i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_test_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_test_epi32_mask(__mmask8 k1, __m256i a,
                                         __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_test_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_test_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_test_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_test_epi64_mask(__mmask8 k1, __m256i a,
                                         __m256i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_test_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_test_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_mask_testn_epi32_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm256_testn_epi32_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_mask_testn_epi64_mask(__mmask8 k1,
                                          __m256i a, __m256i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm256_testn_epi64_mask(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_cmp_ph_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256h a, 
    __m256h b, 
    const int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_cmp_ph_mask(__m256h a, __m256h b,
                                 const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_cmp_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256h a, 
    __m256h b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_mask_cmp_ph_mask(__mmask16 k1, __m256h a,
                                      __m256h b,
                                      const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 15
        	IF k1[j]
        		k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

XMM
~~~
_mm_cmp_epi8_mask
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm_cmp_epi8_mask(__m128i a, __m128i b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmpeq_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmpge_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmpgt_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmple_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmplt_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_cmpneq_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmp_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm_mask_cmp_epi8_mask(__mmask16 k1, __m128i a,
                                     __m128i b, const int imm8)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpeq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpeq_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpge_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpge_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpgt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpgt_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmple_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmple_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmplt_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmplt_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpneq_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI8 a, 
    SI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpneq_epi8_mask(__mmask16 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmp_epu8_mask
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm_cmp_epu8_mask(__m128i a, __m128i b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmpeq_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmpge_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmpgt_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmple_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmplt_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_cmpneq_epu8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmp_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm_mask_cmp_epu8_mask(__mmask16 k1, __m128i a,
                                     __m128i b, const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpeq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpeq_epu8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpge_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpge_epu8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpgt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpgt_epu8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmple_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmple_epu8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmplt_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmplt_epu8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_cmpneq_epu8_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_cmpneq_epu8_mask(__mmask16 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epu16_mask(__m128i a, __m128i b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmpge_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmple_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmplt_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epu16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmp_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epu16_mask(__mmask8 k1, __m128i a,
                                     __m128i b, const int imm8)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpeq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epu16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpge_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epu16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpgt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epu16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmple_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epu16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmplt_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epu16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpneq_epu16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epu16_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epi16_mask(__m128i a, __m128i b,
                                const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmpge_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmple_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmplt_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmp_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epi16_mask(__mmask8 k1, __m128i a,
                                     __m128i b, const int imm8)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpeq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpge_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpgt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmple_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmplt_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmpneq_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epi16_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_test_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_test_epi8_mask(__mmask16 k1, __m128i a,
                                      __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_test_epi8_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_test_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_test_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_test_epi16_mask(__mmask8 k1, __m128i a,
                                      __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_test_epi16_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_test_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_mask_testn_epi8_mask(__mmask16 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k1[j]
        		k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_testn_epi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __mmask16 _mm_testn_epi8_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:16] := 0
        	

_mm_mask_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_mask_testn_epi16_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k1[j]
        		k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_testn_epi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __mmask8 _mm_testn_epi16_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_conflict_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_conflict_epi32(__m128i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	FOR k := 0 to j-1
        		m := k*32
        		dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        	ENDFOR
        	dst[i+31:i+j] := 0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_conflict_epi32(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_conflict_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_conflict_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*32
        			dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
        		ENDFOR
        		dst[i+31:i+j] := 0
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_conflict_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_conflict_epi64(__m128i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	FOR k := 0 to j-1
        		m := k*64
        		dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        	ENDFOR
        	dst[i+63:i+j] := 0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_conflict_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_conflict_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_conflict_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		FOR l := 0 to j-1
        			m := l*64
        			dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
        		ENDFOR
        		dst[i+63:i+j] := 0
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cmp_pd_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_pd_mask(__m128d a, __m128d b,
                             const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmp_pd_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_pd_mask(__mmask8 k1, __m128d a,
                                  __m128d b, const int imm8)

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmp_ps_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_ps_mask(__m128 a, __m128 b,
                             const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmp_ps_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_ps_mask(__mmask8 k1, __m128 a,
                                  __m128 b, const int imm8)

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epi32_mask(__m128i a, __m128i b,
                                _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmpge_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmple_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmplt_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmp_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epi32_mask(__mmask8 k1, __m128i a,
                                     __m128i b,
                                     _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpeq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpge_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpgt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmple_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmplt_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpneq_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epi32_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epi64_mask(__m128i a, __m128i b,
                                _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmpge_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmple_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmplt_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmp_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epi64_mask(__mmask8 k1, __m128i a,
                                     __m128i b,
                                     _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpeq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpge_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpgt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmple_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmplt_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpneq_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    SI64 a, 
    SI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epi64_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epu32_mask(__m128i a, __m128i b,
                                _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmpge_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmple_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmplt_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epu32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmp_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epu32_mask(__mmask8 k1, __m128i a,
                                     __m128i b,
                                     _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpeq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epu32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpge_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epu32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpgt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epu32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmple_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epu32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmplt_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epu32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_cmpneq_epu32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epu32_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_epu64_mask(__m128i a, __m128i b,
                                _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmpeq_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmpge_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmpgt_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmple_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmplt_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_cmpneq_epu64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmp_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b, 
    _MM_CMPINT_ENUM imm8
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_epu64_mask(__mmask8 k1, __m128i a,
                                     __m128i b,
                                     _MM_CMPINT_ENUM imm8)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[2:0]) OF
        0: OP := _MM_CMPINT_EQ
        1: OP := _MM_CMPINT_LT
        2: OP := _MM_CMPINT_LE
        3: OP := _MM_CMPINT_FALSE
        4: OP := _MM_CMPINT_NE
        5: OP := _MM_CMPINT_NLT
        6: OP := _MM_CMPINT_NLE
        7: OP := _MM_CMPINT_TRUE
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpeq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpeq_epu64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpge_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpge_epu64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpgt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpgt_epu64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmple_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmple_epu64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmplt_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmplt_epu64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_cmpneq_epu64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_cmpneq_epu64_mask(__mmask8 k1, __m128i a,
                                        __m128i b)

.. admonition:: Intel Description

    Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
        	ELSE 
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_test_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_test_epi32_mask(__mmask8 k1, __m128i a,
                                      __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_test_epi32_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_test_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_test_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_test_epi64_mask(__mmask8 k1, __m128i a,
                                      __m128i b)

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_test_epi64_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_test_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_mask_testn_epi32_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_testn_epi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __mmask8 _mm_testn_epi32_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k1, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_mask_testn_epi64_mask(__mmask8 k1, __m128i a,
                                       __m128i b)

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_testn_epi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __mmask8 _mm_testn_epi64_mask(__m128i a, __m128i b);

.. admonition:: Intel Description

    Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
        ENDFOR
        k[MAX:2] := 0
        	

_mm_cmp_round_sd_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_cmp_round_sd_mask(__m128d a, __m128d b,
                                   const int imm8,
                                   const int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_cmp_sd_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_sd_mask(__m128d a, __m128d b,
                             const int imm8)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_mask_cmp_round_sd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_mask_cmp_round_sd_mask(__mmask8 k1, __m128d a,
                                        __m128d b,
                                        const int imm8,
                                        const int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_mask_cmp_sd_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_sd_mask(__mmask8 k1, __m128d a,
                                  __m128d b, const int imm8)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_cmp_round_ss_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_cmp_round_ss_mask(__m128 a, __m128 b,
                                   const int imm8,
                                   const int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_cmp_ss_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_ss_mask(__m128 a, __m128 b,
                             const int imm8)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_mask_cmp_round_ss_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_mask_cmp_round_ss_mask(__mmask8 k1, __m128 a,
                                        __m128 b,
                                        const int imm8,
                                        const int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_mask_cmp_ss_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_ss_mask(__mmask8 k1, __m128 a,
                                  __m128 b, const int imm8)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_comi_round_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    int _mm_comi_round_sd(__m128d a, __m128d b, const int imm8,
                          const int sae)

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        RETURN ( a[63:0] OP b[63:0] ) ? 1 : 0
        	

_mm_comi_round_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    int _mm_comi_round_ss(__m128 a, __m128 b, const int imm8,
                          const int sae)

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        RETURN ( a[31:0] OP b[31:0] ) ? 1 : 0
        	

_mm_cmp_ph_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128h a, 
    __m128h b, 
    const int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_ph_mask(__m128h a, __m128h b,
                             const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_cmp_ph_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128h a, 
    __m128h b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_ph_mask(__mmask8 k1, __m128h a,
                                  __m128h b, const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	IF k1[j]
        		k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_cmp_sh_mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128h a, 
    __m128h b, 
    const int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_cmp_sh_mask(__m128h a, __m128h b,
                             const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := (a.fp16[0] OP b.fp16[0]) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_cmp_round_sh_mask
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128h a, 
    __m128h b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_cmp_round_sh_mask(__m128h a, __m128h b,
                                   const int imm8,
                                   const int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        k[0] := (a.fp16[0] OP b.fp16[0]) ? 1 : 0
        k[MAX:1] := 0
        	

_mm_mask_cmp_sh_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128h a, 
    __m128h b, 
    const int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_cmp_sh_mask(__mmask8 k1, __m128h a,
                                  __m128h b, const int imm8)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_mask_cmp_round_sh_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128h a, 
    __m128h b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k1, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __mmask8 _mm_mask_cmp_round_sh_mask(__mmask8 k1, __m128h a,
                                        __m128h b,
                                        const int imm8,
                                        const int sae)

.. admonition:: Intel Description

    Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        IF k1[0]
        	k[0] := ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_comi_sh
^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b, 
    const int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    int _mm_comi_sh(__m128h a, __m128h b, const int imm8);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        RETURN ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
        	

_mm_comi_round_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    int _mm_comi_round_sh(__m128h a, __m128h b, const int imm8,
                          const int sae)

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ
        26: OP := _CMP_NGT_UQ
        27: OP := _CMP_FALSE_OS
        28: OP := _CMP_NEQ_OS
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        RETURN ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
        	

_mm_comieq_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comieq_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for equality, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] == b.fp16[0] ) ? 1 : 0
        	

_mm_comilt_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comilt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] < b.fp16[0] ) ? 1 : 0
        	

_mm_comile_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comile_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] <= b.fp16[0] ) ? 1 : 0
        	

_mm_comigt_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comigt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] > b.fp16[0] ) ? 1 : 0
        	

_mm_comige_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comige_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] >= b.fp16[0] ) ? 1 : 0
        	

_mm_comineq_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_comineq_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for not-equal, and return the boolean result (0 or 1).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] ==NaN OR b.fp16[0] ==NaN OR a.fp16[0] != b.fp16[0] ) ? 1 : 0
        	

_mm_ucomieq_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomieq_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] == b.fp16[0] ) ? 1 : 0
        	

_mm_ucomilt_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomilt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] < b.fp16[0] ) ? 1 : 0
        	

_mm_ucomile_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomile_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] <= b.fp16[0] ) ? 1 : 0
        	

_mm_ucomigt_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomigt_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] > b.fp16[0] ) ? 1 : 0
        	

_mm_ucomige_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomige_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] >= b.fp16[0] ) ? 1 : 0
        	

_mm_ucomineq_sh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Compare
:Header: immintrin.h
:Searchable: AVX-512-Compare-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    int _mm_ucomineq_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        RETURN ( a.fp16[0] ==NaN OR b.fp16[0] ==NaN OR a.fp16[0] != b.fp16[0] ) ? 1 : 0
        	

Mask
----
ZMM
~~~
_mm512_kandn
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kandn(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := (NOT a[15:0]) AND b[15:0]
        k[MAX:16] := 0
        	

_mm512_kand
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kand(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] AND b[15:0]
        k[MAX:16] := 0
        	

_mm512_kmov
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask16 _mm512_kmov(__mmask16 a);

.. admonition:: Intel Description

    Copy 16-bit mask "a" to "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0]
        k[MAX:16] := 0
        	

_mm512_knot
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask16 _mm512_knot(__mmask16 a);

.. admonition:: Intel Description

    Compute the bitwise NOT of 16-bit mask "a", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := NOT a[15:0]
        k[MAX:16] := 0
        	

_mm512_kor
^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kor(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] OR b[15:0]
        k[MAX:16] := 0
        	

_mm512_kunpackb
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kunpackb(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Unpack and interleave 8 bits from masks "a" and "b", and store the 16-bit result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := b[7:0]
        k[15:8] := a[7:0]
        k[MAX:16] := 0
        	

_mm512_kxnor
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kxnor(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := NOT (a[15:0] XOR b[15:0])
        k[MAX:16] := 0
        	

_mm512_kxor
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _mm512_kxor(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] XOR b[15:0]
        k[MAX:16] := 0
        	

_mm512_kortestz
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k1, 
    __mmask16 k2
:Param ETypes:
    MASK k1, 
    MASK k2

.. code-block:: C

    int _mm512_kortestz(__mmask16 k1, __mmask16 k2);

.. admonition:: Intel Description

    Performs bitwise OR between "k1" and "k2", storing the result in "dst". ZF flag is set if "dst" is 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[15:0] := k1[15:0] | k2[15:0]
        IF dst == 0
        	SetZF()
        FI
        	

_mm512_kortestc
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k1, 
    __mmask16 k2
:Param ETypes:
    MASK k1, 
    MASK k2

.. code-block:: C

    int _mm512_kortestc(__mmask16 k1, __mmask16 k2);

.. admonition:: Intel Description

    Performs bitwise OR between "k1" and "k2", storing the result in "dst". CF flag is set if "dst" consists of all 1's.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[15:0] := k1[15:0] | k2[15:0]
        IF PopCount(dst[15:0]) == 16
        	SetCF()
        FI
        	

_mm512_mask2int
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __mmask16 k1
:Param ETypes:
    MASK k1

.. code-block:: C

    int _mm512_mask2int(__mmask16 k1);

.. admonition:: Intel Description

    Converts bit mask "k1" into an integer value, storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := ZeroExtend32(k1)
        	

_mm512_int2mask
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    int mask
:Param ETypes:
    UI16 mask

.. code-block:: C

    __mmask16 _mm512_int2mask(int mask);

.. admonition:: Intel Description

    Converts integer "mask" into bitmask, storing the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := mask[15:0]
        	

_mm512_2intersect_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    __m512i a, 
    __m512i b, 
    __mmask16* k1, 
    __mmask16* k2
:Param ETypes:
    UI32 a, 
    UI32 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm512_2intersect_epi32(__m512i a, __m512i b,
                                 __mmask16* k1, __mmask16* k2)

.. admonition:: Intel Description

    Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+15:k1] := 0
        MEM[k2+15:k2] := 0
        FOR i := 0 TO 15
        	FOR j := 0 TO 15
        		match := (a.dword[i] == b.dword[j] ? 1 : 0)
        		MEM[k1+15:k1].bit[i] |= match
        		MEM[k2+15:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

_mm512_2intersect_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-ZMM
:Register: ZMM 512 bit
:Return Type: void
:Param Types:
    __m512i a, 
    __m512i b, 
    __mmask8* k1, 
    __mmask8* k2
:Param ETypes:
    UI64 a, 
    UI64 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm512_2intersect_epi64(__m512i a, __m512i b,
                                 __mmask8* k1, __mmask8* k2)

.. admonition:: Intel Description

    Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+7:k1] := 0
        MEM[k2+7:k2] := 0
        FOR i := 0 TO 7
        	FOR j := 0 TO 7
        		match := (a.qword[i] == b.qword[j] ? 1 : 0)
        		MEM[k1+7:k1].bit[i] |= match
        		MEM[k2+7:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

YMM
~~~
_mm256_2intersect_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __m256i a, 
    __m256i b, 
    __mmask8* k1, 
    __mmask8* k2
:Param ETypes:
    UI32 a, 
    UI32 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm256_2intersect_epi32(__m256i a, __m256i b,
                                 __mmask8* k1, __mmask8* k2)

.. admonition:: Intel Description

    Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+7:k1] := 0
        MEM[k2+7:k2] := 0
        FOR i := 0 TO 7
        	FOR j := 0 TO 7
        		match := (a.dword[i] == b.dword[j] ? 1 : 0)
        		MEM[k1+7:k1].bit[i] |= match
        		MEM[k2+7:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

_mm256_2intersect_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __m256i a, 
    __m256i b, 
    __mmask8* k1, 
    __mmask8* k2
:Param ETypes:
    UI64 a, 
    UI64 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm256_2intersect_epi64(__m256i a, __m256i b,
                                 __mmask8* k1, __mmask8* k2)

.. admonition:: Intel Description

    Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+7:k1] := 0
        MEM[k2+7:k2] := 0
        FOR i := 0 TO 3
        	FOR j := 0 TO 3
        		match := (a.qword[i] == b.qword[j] ? 1 : 0)
        		MEM[k1+7:k1].bit[i] |= match
        		MEM[k2+7:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

XMM
~~~
_mm_2intersect_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i a, 
    __m128i b, 
    __mmask8* k1, 
    __mmask8* k2
:Param ETypes:
    UI32 a, 
    UI32 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm_2intersect_epi32(__m128i a, __m128i b,
                              __mmask8* k1, __mmask8* k2)

.. admonition:: Intel Description

    Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+7:k1] := 0
        MEM[k2+7:k2] := 0
        FOR i := 0 TO 3
        	FOR j := 0 TO 3
        		match := (a.dword[i] == b.dword[j] ? 1 : 0)
        		MEM[k1+7:k1].bit[i] |= match
        		MEM[k2+7:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

_mm_2intersect_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __m128i a, 
    __m128i b, 
    __mmask8* k1, 
    __mmask8* k2
:Param ETypes:
    UI64 a, 
    UI64 b, 
    MASK k1, 
    MASK k2

.. code-block:: C

    void _mm_2intersect_epi64(__m128i a, __m128i b,
                              __mmask8* k1, __mmask8* k2)

.. admonition:: Intel Description

    Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[k1+7:k1] := 0
        MEM[k2+7:k2] := 0
        FOR i := 0 TO 1
        	FOR j := 0 TO 1
        		match := (a.qword[i] == b.qword[j] ? 1 : 0)
        		MEM[k1+7:k1].bit[i] |= match
        		MEM[k2+7:k2].bit[j] |= match
        	ENDFOR
        ENDFOR
        	

Other
~~~~~
_kadd_mask32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kadd_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Add 32-bit masks in "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := a[31:0] + b[31:0]
        k[MAX:32] := 0
        	

_kadd_mask64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kadd_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Add 64-bit masks in "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := a[63:0] + b[63:0]
        k[MAX:64] := 0
        	

_kand_mask32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kand_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 32-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := a[31:0] AND b[31:0]
        k[MAX:32] := 0
        	

_kand_mask64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kand_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 64-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := a[63:0] AND b[63:0]
        k[MAX:64] := 0
        	

_kandn_mask32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kandn_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 32-bit masks "a" and then AND with "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := (NOT a[31:0]) AND b[31:0]
        k[MAX:32] := 0
        	

_kandn_mask64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kandn_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64-bit masks "a" and then AND with "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := (NOT a[63:0]) AND b[63:0]
        k[MAX:64] := 0
        	

_knot_mask32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask32 _knot_mask32(__mmask32 a);

.. admonition:: Intel Description

    Compute the bitwise NOT of 32-bit mask "a", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := NOT a[31:0]
        k[MAX:32] := 0
        	

_knot_mask64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask64 _knot_mask64(__mmask64 a);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64-bit mask "a", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := NOT a[63:0]
        k[MAX:64] := 0
        	

_kor_mask32
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kor_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 32-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := a[31:0] OR b[31:0]
        k[MAX:32] := 0
        	

_kor_mask64
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kor_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 64-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := a[63:0] OR b[63:0]
        k[MAX:64] := 0
        	

_kxnor_mask32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kxnor_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise XNOR of 32-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := NOT (a[31:0] XOR b[31:0])
        k[MAX:32] := 0
        	

_kxnor_mask64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kxnor_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise XNOR of 64-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := NOT (a[63:0] XOR b[63:0])
        k[MAX:64] := 0
        	

_kxor_mask32
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _kxor_mask32(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 32-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[31:0] := a[31:0] XOR b[31:0]
        k[MAX:32] := 0
        	

_kxor_mask64
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _kxor_mask64(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 64-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[63:0] := a[63:0] XOR b[63:0]
        k[MAX:64] := 0
        	

_kshiftli_mask32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask32 _kshiftli_mask32(__mmask32 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 32-bit mask "a" left by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 31
        	k[31:0] := a[31:0] << count[7:0]
        FI
        	

_kshiftli_mask64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask64 _kshiftli_mask64(__mmask64 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 64-bit mask "a" left by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 63
        	k[63:0] := a[63:0] << count[7:0]
        FI
        	

_kshiftri_mask32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask32 _kshiftri_mask32(__mmask32 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 32-bit mask "a" right by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 31
        	k[31:0] := a[31:0] >> count[7:0]
        FI
        	

_kshiftri_mask64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask64 _kshiftri_mask64(__mmask64 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 64-bit mask "a" right by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 63
        	k[63:0] := a[63:0] >> count[7:0]
        FI
        	

_kortest_mask32_u8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b, 
    unsigned char* all_ones
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 all_ones

.. code-block:: C

    unsigned char _kortest_mask32_u8(__mmask32 a, __mmask32 b, unsigned char* all_ones);

.. admonition:: Intel Description

    Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] OR b[31:0]
        IF tmp[31:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        IF tmp[31:0] == 0xFFFFFFFF
        	MEM[all_ones+7:all_ones] := 1
        ELSE
        	MEM[all_ones+7:all_ones] := 0
        FI
        	

_kortestz_mask32_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestz_mask32_u8(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] OR b[31:0]
        IF tmp[31:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_kortestc_mask32_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestc_mask32_u8(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] OR b[31:0]
        IF tmp[31:0] == 0xFFFFFFFF
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_kortest_mask64_u8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b, 
    unsigned char* all_ones
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 all_ones

.. code-block:: C

    unsigned char _kortest_mask64_u8(__mmask64 a, __mmask64 b, unsigned char* all_ones);

.. admonition:: Intel Description

    Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0] OR b[63:0]
        IF tmp[63:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        IF tmp[7:0] == 0xFFFFFFFFFFFFFFFF
        	MEM[all_ones+7:all_ones] := 1
        ELSE
        	MEM[all_ones+7:all_ones] := 0
        FI
        	

_kortestz_mask64_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestz_mask64_u8(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0] OR b[63:0]
        IF tmp[63:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_kortestc_mask64_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestc_mask64_u8(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0] OR b[63:0]
        IF tmp[63:0] == 0xFFFFFFFFFFFFFFFF
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktest_mask32_u8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b, 
    unsigned char* and_not
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 and_not

.. code-block:: C

    unsigned char _ktest_mask32_u8(__mmask32 a, __mmask32 b, unsigned char* and_not);

.. admonition:: Intel Description

    Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp1[31:0] := a[31:0] AND b[31:0]
        IF tmp1[31:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        tmp2[31:0] := (NOT a[31:0]) AND b[31:0]
        IF tmp2[31:0] == 0x0
        	MEM[and_not+7:and_not] := 1
        ELSE
        	MEM[and_not+7:and_not] := 0
        FI
        	

_ktestz_mask32_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestz_mask32_u8(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := a[31:0] AND b[31:0]
        IF tmp[31:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktestc_mask32_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestc_mask32_u8(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 32-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := (NOT a[31:0]) AND b[31:0]
        IF tmp[31:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktest_mask64_u8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b, 
    unsigned char* and_not
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 and_not

.. code-block:: C

    unsigned char _ktest_mask64_u8(__mmask64 a, __mmask64 b, unsigned char* and_not);

.. admonition:: Intel Description

    Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp1[63:0] := a[63:0] AND b[63:0]
        IF tmp1[63:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        tmp2[63:0] := (NOT a[63:0]) AND b[63:0]
        IF tmp2[63:0] == 0x0
        	MEM[and_not+7:and_not] := 1
        ELSE
        	MEM[and_not+7:and_not] := 0
        FI
        	

_ktestz_mask64_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestz_mask64_u8(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := a[63:0] AND b[63:0]
        IF tmp[63:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktestc_mask64_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestc_mask64_u8(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 64-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := (NOT a[63:0]) AND b[63:0]
        IF tmp[63:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_cvtmask32_u32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned int
:Param Types:
    __mmask32 a
:Param ETypes:
    MASK a

.. code-block:: C

    unsigned int _cvtmask32_u32(__mmask32 a);

.. admonition:: Intel Description

    Convert 32-bit mask "a" into an integer value, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := ZeroExtend32(a[31:0])
        	

_cvtmask64_u64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned __int64
:Param Types:
    __mmask64 a
:Param ETypes:
    MASK a

.. code-block:: C

    unsigned __int64 _cvtmask64_u64(__mmask64 a);

.. admonition:: Intel Description

    Convert 64-bit mask "a" into an integer value, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := ZeroExtend64(a[63:0])
        	

_cvtu32_mask32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask32
:Param Types:
    unsigned int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __mmask32 _cvtu32_mask32(unsigned int a);

.. admonition:: Intel Description

    Convert integer value "a" into an 32-bit mask, and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k := ZeroExtend32(a[31:0])
        	

_cvtu64_mask64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask64
:Param Types:
    unsigned __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __mmask64 _cvtu64_mask64(unsigned __int64 a);

.. admonition:: Intel Description

    Convert integer value "a" into an 64-bit mask, and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k := ZeroExtend64(a[63:0])
        	

_kadd_mask8
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kadd_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Add 8-bit masks in "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := a[7:0] + b[7:0]
        k[MAX:8] := 0
        	

_kadd_mask16
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kadd_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Add 16-bit masks in "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] + b[15:0]
        k[MAX:16] := 0
        	

_kand_mask8
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kand_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 8-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := a[7:0] AND b[7:0]
        k[MAX:8] := 0
        	

_kandn_mask8
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kandn_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 8-bit masks "a" and then AND with "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := (NOT a[7:0]) AND b[7:0]
        k[MAX:8] := 0
        	

_knot_mask8
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask8 _knot_mask8(__mmask8 a);

.. admonition:: Intel Description

    Compute the bitwise NOT of 8-bit mask "a", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := NOT a[7:0]
        k[MAX:8] := 0
        	

_kor_mask8
^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kor_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 8-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := a[7:0] OR b[7:0]
        k[MAX:8] := 0
        	

_kxnor_mask8
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kxnor_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise XNOR of 8-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := NOT (a[7:0] XOR b[7:0])
        k[MAX:8] := 0
        	

_kxor_mask8
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask8 _kxor_mask8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 8-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[7:0] := a[7:0] XOR b[7:0]
        k[MAX:8] := 0
        	

_kshiftli_mask8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask8 _kshiftli_mask8(__mmask8 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 8-bit mask "a" left by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 7
        	k[7:0] := a[7:0] << count[7:0]
        FI
        	

_kshiftri_mask8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    __mmask8 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask8 _kshiftri_mask8(__mmask8 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 8-bit mask "a" right by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 7
        	k[7:0] := a[7:0] >> count[7:0]
        FI
        	

_kortest_mask8_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b, 
    unsigned char* all_ones
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 all_ones

.. code-block:: C

    unsigned char _kortest_mask8_u8(__mmask8 a, __mmask8 b, unsigned char* all_ones);

.. admonition:: Intel Description

    Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[7:0] := a[7:0] OR b[7:0]
        IF tmp[7:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        IF tmp[7:0] == 0xFF
        	MEM[all_ones+7:all_ones] := 1
        ELSE
        	MEM[all_ones+7:all_ones] := 0
        FI
        	

_kortestz_mask8_u8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestz_mask8_u8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[7:0] := a[7:0] OR b[7:0]
        IF tmp[7:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_kortestc_mask8_u8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestc_mask8_u8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[7:0] := a[7:0] OR b[7:0]
        IF tmp[7:0] == 0xFF
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktest_mask8_u8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b, 
    unsigned char* and_not
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 and_not

.. code-block:: C

    unsigned char _ktest_mask8_u8(__mmask8 a, __mmask8 b, unsigned char* and_not);

.. admonition:: Intel Description

    Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp1[7:0] := a[7:0] AND b[7:0]
        IF tmp1[7:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        tmp2[7:0] := (NOT a[7:0]) AND b[7:0]
        IF tmp2[7:0] == 0x0
        	MEM[and_not+7:and_not] := 1
        ELSE
        	MEM[and_not+7:and_not] := 0
        FI
        	

_ktestz_mask8_u8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestz_mask8_u8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[7:0] := a[7:0] AND b[7:0]
        IF tmp[7:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktestc_mask8_u8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask8 a, 
    __mmask8 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestc_mask8_u8(__mmask8 a, __mmask8 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 8-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[7:0] := (NOT a[7:0]) AND b[7:0]
        IF tmp[7:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktest_mask16_u8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b, 
    unsigned char* and_not
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 and_not

.. code-block:: C

    unsigned char _ktest_mask16_u8(__mmask16 a, __mmask16 b, unsigned char* and_not);

.. admonition:: Intel Description

    Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp1[15:0] := a[15:0] AND b[15:0]
        IF tmp1[15:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        tmp2[15:0] := (NOT a[15:0]) AND b[15:0]
        IF tmp2[15:0] == 0x0
        	MEM[and_not+7:and_not] := 1
        ELSE
        	MEM[and_not+7:and_not] := 0
        FI
        	

_ktestz_mask16_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestz_mask16_u8(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[15:0] := a[15:0] AND b[15:0]
        IF tmp[15:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_ktestc_mask16_u8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _ktestc_mask16_u8(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 16-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[15:0] := (NOT a[15:0]) AND b[15:0]
        IF tmp[15:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_cvtmask8_u32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned int
:Param Types:
    __mmask8 a
:Param ETypes:
    MASK a

.. code-block:: C

    unsigned int _cvtmask8_u32(__mmask8 a);

.. admonition:: Intel Description

    Convert 8-bit mask "a" into an integer value, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := ZeroExtend32(a[7:0])
        	

_cvtu32_mask8
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask8
:Param Types:
    unsigned int a
:Param ETypes:
    UI8 a

.. code-block:: C

    __mmask8 _cvtu32_mask8(unsigned int a);

.. admonition:: Intel Description

    Convert integer value "a" into an 8-bit mask, and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k := a[7:0]
        	

_kand_mask16
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kand_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] AND b[15:0]
        k[MAX:16] := 0
        	

_kandn_mask16
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kandn_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := (NOT a[15:0]) AND b[15:0]
        k[MAX:16] := 0
        	

_knot_mask16
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a
:Param ETypes:
    MASK a

.. code-block:: C

    __mmask16 _knot_mask16(__mmask16 a);

.. admonition:: Intel Description

    Compute the bitwise NOT of 16-bit mask "a", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := NOT a[15:0]
        k[MAX:16] := 0
        	

_kor_mask16
^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kor_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] OR b[15:0]
        k[MAX:16] := 0
        	

_kxnor_mask16
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kxnor_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := NOT (a[15:0] XOR b[15:0])
        k[MAX:16] := 0
        	

_kxor_mask16
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask16 _kxor_mask16(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[15:0] := a[15:0] XOR b[15:0]
        k[MAX:16] := 0
        	

_kshiftli_mask16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask16 _kshiftli_mask16(__mmask16 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 16-bit mask "a" left by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 15
        	k[15:0] := a[15:0] << count[7:0]
        FI
        	

_kshiftri_mask16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    __mmask16 a, 
    unsigned int count
:Param ETypes:
    MASK a, 
    IMM count

.. code-block:: C

    __mmask16 _kshiftri_mask16(__mmask16 a, unsigned int count);

.. admonition:: Intel Description

    Shift the bits of 16-bit mask "a" right by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k[MAX:0] := 0
        IF count[7:0] <= 15
        	k[15:0] := a[15:0] >> count[7:0]
        FI
        	

_kortest_mask16_u8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b, 
    unsigned char* all_ones
:Param ETypes:
    MASK a, 
    MASK b, 
    UI8 all_ones

.. code-block:: C

    unsigned char _kortest_mask16_u8(__mmask16 a, __mmask16 b, unsigned char* all_ones);

.. admonition:: Intel Description

    Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[15:0] := a[15:0] OR b[15:0]
        IF tmp[15:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        IF tmp[15:0] == 0xFFFF
        	MEM[all_ones+7:all_ones] := 1
        ELSE
        	MEM[all_ones+7:all_ones] := 0
        FI
        	

_kortestz_mask16_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestz_mask16_u8(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[15:0] := a[15:0] OR b[15:0]
        IF tmp[15:0] == 0x0
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_kortestc_mask16_u8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned char
:Param Types:
    __mmask16 a, 
    __mmask16 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    unsigned char _kortestc_mask16_u8(__mmask16 a, __mmask16 b);

.. admonition:: Intel Description

    Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[15:0] := a[15:0] OR b[15:0]
        IF tmp[15:0] == 0xFFFF
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_cvtmask16_u32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: unsigned int
:Param Types:
    __mmask16 a
:Param ETypes:
    MASK a

.. code-block:: C

    unsigned int _cvtmask16_u32(__mmask16 a);

.. admonition:: Intel Description

    Convert 16-bit mask "a" into an integer value, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst := ZeroExtend32(a[15:0])
        	

_cvtu32_mask16
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Mask
:Header: immintrin.h
:Searchable: AVX-512-Mask-Other
:Return Type: __mmask16
:Param Types:
    unsigned int a
:Param ETypes:
    UI16 a

.. code-block:: C

    __mmask16 _cvtu32_mask16(unsigned int a);

.. admonition:: Intel Description

    Convert integer value "a" into an 16-bit mask, and store the result in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        k := ZeroExtend16(a[15:0])
        	

Set
---
ZMM
~~~
_mm512_mask_set1_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    char a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_set1_epi8(__m512i src, __mmask64 k,
                                  char a)

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_set1_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    char a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_set1_epi8(__mmask64 k, char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_set1_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    short a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_set1_epi16(__m512i src, __mmask32 k,
                                   short a)

.. admonition:: Intel Description

    Broadcast 16-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_set1_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    short a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_set1_epi16(__mmask32 k, short a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    char a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_set1_epi8(char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_set1_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    int a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_set1_epi32(__m512i src, __mmask16 k,
                                   int a)

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_set1_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    int a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_set1_epi32(__mmask16 k, int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_set1_epi32(int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_set1_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __int64 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_set1_epi64(__m512i src, __mmask8 k,
                                   __int64 a)

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_set1_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __int64 a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_set1_epi64(__mmask8 k, __int64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __int64 a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_set1_epi64(__int64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    short a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512i _mm512_set1_epi16(short a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_set1_pd(double a);

.. admonition:: Intel Description

    Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_set1_ps(float a);

.. admonition:: Intel Description

    Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set4_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    int d, 
    int c, 
    int b, 
    int a
:Param ETypes:
    UI32 d, 
    UI32 c, 
    UI32 b, 
    UI32 a

.. code-block:: C

    __m512i _mm512_set4_epi32(int d, int c, int b, int a);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the repeated 4 element sequence.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a
        dst[63:32] := b
        dst[95:64] := c
        dst[127:96] := d
        dst[159:128] := a
        dst[191:160] := b
        dst[223:192] := c
        dst[255:224] := d
        dst[287:256] := a
        dst[319:288] := b
        dst[351:320] := c
        dst[383:352] := d
        dst[415:384] := a
        dst[447:416] := b
        dst[479:448] := c
        dst[511:480] := d
        dst[MAX:512] := 0
        	

_mm512_set4_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __int64 d, 
    __int64 c, 
    __int64 b, 
    __int64 a
:Param ETypes:
    UI64 d, 
    UI64 c, 
    UI64 b, 
    UI64 a

.. code-block:: C

    __m512i _mm512_set4_epi64(__int64 d, __int64 c, __int64 b,
                              __int64 a)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the repeated 4 element sequence.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a
        dst[127:64] := b
        dst[191:128] := c
        dst[255:192] := d
        dst[319:256] := a
        dst[383:320] := b
        dst[447:384] := c
        dst[511:448] := d
        dst[MAX:512] := 0
        	

_mm512_set4_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    double d, 
    double c, 
    double b, 
    double a
:Param ETypes:
    FP64 d, 
    FP64 c, 
    FP64 b, 
    FP64 a

.. code-block:: C

    __m512d _mm512_set4_pd(double d, double c, double b,
                           double a)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a
        dst[127:64] := b
        dst[191:128] := c
        dst[255:192] := d
        dst[319:256] := a
        dst[383:320] := b
        dst[447:384] := c
        dst[511:448] := d
        dst[MAX:512] := 0
        	

_mm512_set4_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    float d, 
    float c, 
    float b, 
    float a
:Param ETypes:
    FP32 d, 
    FP32 c, 
    FP32 b, 
    FP32 a

.. code-block:: C

    __m512 _mm512_set4_ps(float d, float c, float b, float a);

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a
        dst[63:32] := b
        dst[95:64] := c
        dst[127:96] := d
        dst[159:128] := a
        dst[191:160] := b
        dst[223:192] := c
        dst[255:224] := d
        dst[287:256] := a
        dst[319:288] := b
        dst[351:320] := c
        dst[383:352] := d
        dst[415:384] := a
        dst[447:416] := b
        dst[479:448] := c
        dst[511:480] := d
        dst[MAX:512] := 0
        	

_mm512_set_epi8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    char e63, 
    char e62, 
    char e61, 
    char e60, 
    char e59, 
    char e58, 
    char e57, 
    char e56, 
    char e55, 
    char e54, 
    char e53, 
    char e52, 
    char e51, 
    char e50, 
    char e49, 
    char e48, 
    char e47, 
    char e46, 
    char e45, 
    char e44, 
    char e43, 
    char e42, 
    char e41, 
    char e40, 
    char e39, 
    char e38, 
    char e37, 
    char e36, 
    char e35, 
    char e34, 
    char e33, 
    char e32, 
    char e31, 
    char e30, 
    char e29, 
    char e28, 
    char e27, 
    char e26, 
    char e25, 
    char e24, 
    char e23, 
    char e22, 
    char e21, 
    char e20, 
    char e19, 
    char e18, 
    char e17, 
    char e16, 
    char e15, 
    char e14, 
    char e13, 
    char e12, 
    char e11, 
    char e10, 
    char e9, 
    char e8, 
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e63, 
    UI8 e62, 
    UI8 e61, 
    UI8 e60, 
    UI8 e59, 
    UI8 e58, 
    UI8 e57, 
    UI8 e56, 
    UI8 e55, 
    UI8 e54, 
    UI8 e53, 
    UI8 e52, 
    UI8 e51, 
    UI8 e50, 
    UI8 e49, 
    UI8 e48, 
    UI8 e47, 
    UI8 e46, 
    UI8 e45, 
    UI8 e44, 
    UI8 e43, 
    UI8 e42, 
    UI8 e41, 
    UI8 e40, 
    UI8 e39, 
    UI8 e38, 
    UI8 e37, 
    UI8 e36, 
    UI8 e35, 
    UI8 e34, 
    UI8 e33, 
    UI8 e32, 
    UI8 e31, 
    UI8 e30, 
    UI8 e29, 
    UI8 e28, 
    UI8 e27, 
    UI8 e26, 
    UI8 e25, 
    UI8 e24, 
    UI8 e23, 
    UI8 e22, 
    UI8 e21, 
    UI8 e20, 
    UI8 e19, 
    UI8 e18, 
    UI8 e17, 
    UI8 e16, 
    UI8 e15, 
    UI8 e14, 
    UI8 e13, 
    UI8 e12, 
    UI8 e11, 
    UI8 e10, 
    UI8 e9, 
    UI8 e8, 
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m512i _mm512_set_epi8(
        char e63, char e62, char e61, char e60, char e59,
        char e58, char e57, char e56, char e55, char e54,
        char e53, char e52, char e51, char e50, char e49,
        char e48, char e47, char e46, char e45, char e44,
        char e43, char e42, char e41, char e40, char e39,
        char e38, char e37, char e36, char e35, char e34,
        char e33, char e32, char e31, char e30, char e29,
        char e28, char e27, char e26, char e25, char e24,
        char e23, char e22, char e21, char e20, char e19,
        char e18, char e17, char e16, char e15, char e14,
        char e13, char e12, char e11, char e10, char e9,
        char e8, char e7, char e6, char e5, char e4, char e3,
        char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e0
        dst[15:8] := e1
        dst[23:16] := e2
        dst[31:24] := e3
        dst[39:32] := e4
        dst[47:40] := e5
        dst[55:48] := e6
        dst[63:56] := e7
        dst[71:64] := e8
        dst[79:72] := e9
        dst[87:80] := e10
        dst[95:88] := e11
        dst[103:96] := e12
        dst[111:104] := e13
        dst[119:112] := e14
        dst[127:120] := e15
        dst[135:128] := e16
        dst[143:136] := e17
        dst[151:144] := e18
        dst[159:152] := e19
        dst[167:160] := e20
        dst[175:168] := e21
        dst[183:176] := e22
        dst[191:184] := e23
        dst[199:192] := e24
        dst[207:200] := e25
        dst[215:208] := e26
        dst[223:216] := e27
        dst[231:224] := e28
        dst[239:232] := e29
        dst[247:240] := e30
        dst[255:248] := e31
        dst[263:256] := e32
        dst[271:264] := e33
        dst[279:272] := e34
        dst[287:280] := e35
        dst[295:288] := e36
        dst[303:296] := e37
        dst[311:304] := e38
        dst[319:312] := e39
        dst[327:320] := e40
        dst[335:328] := e41
        dst[343:336] := e42
        dst[351:344] := e43
        dst[359:352] := e44
        dst[367:360] := e45
        dst[375:368] := e46
        dst[383:376] := e47
        dst[391:384] := e48
        dst[399:392] := e49
        dst[407:400] := e50
        dst[415:408] := e51
        dst[423:416] := e52
        dst[431:424] := e53
        dst[439:432] := e54
        dst[447:440] := e55
        dst[455:448] := e56
        dst[463:456] := e57
        dst[471:464] := e58
        dst[479:472] := e59
        dst[487:480] := e60
        dst[495:488] := e61
        dst[503:496] := e62
        dst[511:504] := e63
        dst[MAX:512] := 0
        	

_mm512_set_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    short e31, 
    short e30, 
    short e29, 
    short e28, 
    short e27, 
    short e26, 
    short e25, 
    short e24, 
    short e23, 
    short e22, 
    short e21, 
    short e20, 
    short e19, 
    short e18, 
    short e17, 
    short e16, 
    short e15, 
    short e14, 
    short e13, 
    short e12, 
    short e11, 
    short e10, 
    short e9, 
    short e8, 
    short e7, 
    short e6, 
    short e5, 
    short e4, 
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e31, 
    UI16 e30, 
    UI16 e29, 
    UI16 e28, 
    UI16 e27, 
    UI16 e26, 
    UI16 e25, 
    UI16 e24, 
    UI16 e23, 
    UI16 e22, 
    UI16 e21, 
    UI16 e20, 
    UI16 e19, 
    UI16 e18, 
    UI16 e17, 
    UI16 e16, 
    UI16 e15, 
    UI16 e14, 
    UI16 e13, 
    UI16 e12, 
    UI16 e11, 
    UI16 e10, 
    UI16 e9, 
    UI16 e8, 
    UI16 e7, 
    UI16 e6, 
    UI16 e5, 
    UI16 e4, 
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m512i _mm512_set_epi16(
        short e31, short e30, short e29, short e28, short e27,
        short e26, short e25, short e24, short e23, short e22,
        short e21, short e20, short e19, short e18, short e17,
        short e16, short e15, short e14, short e13, short e12,
        short e11, short e10, short e9, short e8, short e7,
        short e6, short e5, short e4, short e3, short e2,
        short e1, short e0)

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e0
        dst[31:16] := e1
        dst[47:32] := e2
        dst[63:48] := e3
        dst[79:64] := e4
        dst[95:80] := e5
        dst[111:96] := e6
        dst[127:112] := e7
        dst[143:128] := e8
        dst[159:144] := e9
        dst[175:160] := e10
        dst[191:176] := e11
        dst[207:192] := e12
        dst[223:208] := e13
        dst[239:224] := e14
        dst[255:240] := e15
        dst[271:256] := e16
        dst[287:272] := e17
        dst[303:288] := e18
        dst[319:304] := e19
        dst[335:320] := e20
        dst[351:336] := e21
        dst[367:352] := e22
        dst[383:368] := e23
        dst[399:384] := e24
        dst[415:400] := e25
        dst[431:416] := e26
        dst[447:432] := e27
        dst[463:448] := e28
        dst[479:464] := e29
        dst[495:480] := e30
        dst[511:496] := e31
        dst[MAX:512] := 0
        	

_mm512_set_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    int e15, 
    int e14, 
    int e13, 
    int e12, 
    int e11, 
    int e10, 
    int e9, 
    int e8, 
    int e7, 
    int e6, 
    int e5, 
    int e4, 
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e15, 
    UI32 e14, 
    UI32 e13, 
    UI32 e12, 
    UI32 e11, 
    UI32 e10, 
    UI32 e9, 
    UI32 e8, 
    UI32 e7, 
    UI32 e6, 
    UI32 e5, 
    UI32 e4, 
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m512i _mm512_set_epi32(int e15, int e14, int e13, int e12,
                             int e11, int e10, int e9, int e8,
                             int e7, int e6, int e5, int e4,
                             int e3, int e2, int e1, int e0)

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        dst[159:128] := e4
        dst[191:160] := e5
        dst[223:192] := e6
        dst[255:224] := e7
        dst[287:256] := e8
        dst[319:288] := e9
        dst[351:320] := e10
        dst[383:352] := e11
        dst[415:384] := e12
        dst[447:416] := e13
        dst[479:448] := e14
        dst[511:480] := e15
        dst[MAX:512] := 0
        	

_mm512_set_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __int64 e7, 
    __int64 e6, 
    __int64 e5, 
    __int64 e4, 
    __int64 e3, 
    __int64 e2, 
    __int64 e1, 
    __int64 e0
:Param ETypes:
    UI64 e7, 
    UI64 e6, 
    UI64 e5, 
    UI64 e4, 
    UI64 e3, 
    UI64 e2, 
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m512i _mm512_set_epi64(__int64 e7, __int64 e6, __int64 e5,
                             __int64 e4, __int64 e3, __int64 e2,
                             __int64 e1, __int64 e0)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        dst[191:128] := e2
        dst[255:192] := e3
        dst[319:256] := e4
        dst[383:320] := e5
        dst[447:384] := e6
        dst[511:448] := e7
        dst[MAX:512] := 0
        	

_mm512_set_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    double e7, 
    double e6, 
    double e5, 
    double e4, 
    double e3, 
    double e2, 
    double e1, 
    double e0
:Param ETypes:
    FP64 e7, 
    FP64 e6, 
    FP64 e5, 
    FP64 e4, 
    FP64 e3, 
    FP64 e2, 
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m512d _mm512_set_pd(double e7, double e6, double e5,
                          double e4, double e3, double e2,
                          double e1, double e0)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        dst[191:128] := e2
        dst[255:192] := e3
        dst[319:256] := e4
        dst[383:320] := e5
        dst[447:384] := e6
        dst[511:448] := e7
        dst[MAX:512] := 0
        	

_mm512_set_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    float e15, 
    float e14, 
    float e13, 
    float e12, 
    float e11, 
    float e10, 
    float e9, 
    float e8, 
    float e7, 
    float e6, 
    float e5, 
    float e4, 
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e15, 
    FP32 e14, 
    FP32 e13, 
    FP32 e12, 
    FP32 e11, 
    FP32 e10, 
    FP32 e9, 
    FP32 e8, 
    FP32 e7, 
    FP32 e6, 
    FP32 e5, 
    FP32 e4, 
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m512 _mm512_set_ps(float e15, float e14, float e13,
                         float e12, float e11, float e10,
                         float e9, float e8, float e7, float e6,
                         float e5, float e4, float e3, float e2,
                         float e1, float e0)

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        dst[159:128] := e4
        dst[191:160] := e5
        dst[223:192] := e6
        dst[255:224] := e7
        dst[287:256] := e8
        dst[319:288] := e9
        dst[351:320] := e10
        dst[383:352] := e11
        dst[415:384] := e12
        dst[447:416] := e13
        dst[479:448] := e14
        dst[511:480] := e15
        dst[MAX:512] := 0
        	

_mm512_setr4_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    int d, 
    int c, 
    int b, 
    int a
:Param ETypes:
    UI32 d, 
    UI32 c, 
    UI32 b, 
    UI32 a

.. code-block:: C

    __m512i _mm512_setr4_epi32(int d, int c, int b, int a);

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the repeated 4 element sequence in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := d
        dst[63:32] := c
        dst[95:64] := b
        dst[127:96] := a
        dst[159:128] := d
        dst[191:160] := c
        dst[223:192] := b
        dst[255:224] := a
        dst[287:256] := d
        dst[319:288] := c
        dst[351:320] := b
        dst[383:352] := a
        dst[415:384] := d
        dst[447:416] := c
        dst[479:448] := b
        dst[511:480] := a
        dst[MAX:512] := 0
        	

_mm512_setr4_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __int64 d, 
    __int64 c, 
    __int64 b, 
    __int64 a
:Param ETypes:
    UI64 d, 
    UI64 c, 
    UI64 b, 
    UI64 a

.. code-block:: C

    __m512i _mm512_setr4_epi64(__int64 d, __int64 c, __int64 b,
                               __int64 a)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the repeated 4 element sequence in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := d
        dst[127:64] := c
        dst[191:128] := b
        dst[255:192] := a
        dst[319:256] := d
        dst[383:320] := c
        dst[447:384] := b
        dst[511:448] := a
        dst[MAX:512] := 0
        	

_mm512_setr4_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    double d, 
    double c, 
    double b, 
    double a
:Param ETypes:
    FP64 d, 
    FP64 c, 
    FP64 b, 
    FP64 a

.. code-block:: C

    __m512d _mm512_setr4_pd(double d, double c, double b,
                            double a)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := d
        dst[127:64] := c
        dst[191:128] := b
        dst[255:192] := a
        dst[319:256] := d
        dst[383:320] := c
        dst[447:384] := b
        dst[511:448] := a
        dst[MAX:512] := 0
        	

_mm512_setr4_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    float d, 
    float c, 
    float b, 
    float a
:Param ETypes:
    FP32 d, 
    FP32 c, 
    FP32 b, 
    FP32 a

.. code-block:: C

    __m512 _mm512_setr4_ps(float d, float c, float b, float a);

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := d
        dst[63:32] := c
        dst[95:64] := b
        dst[127:96] := a
        dst[159:128] := d
        dst[191:160] := c
        dst[223:192] := b
        dst[255:224] := a
        dst[287:256] := d
        dst[319:288] := c
        dst[351:320] := b
        dst[383:352] := a
        dst[415:384] := d
        dst[447:416] := c
        dst[479:448] := b
        dst[511:480] := a
        dst[MAX:512] := 0
        	

_mm512_setr_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    int e15, 
    int e14, 
    int e13, 
    int e12, 
    int e11, 
    int e10, 
    int e9, 
    int e8, 
    int e7, 
    int e6, 
    int e5, 
    int e4, 
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e15, 
    UI32 e14, 
    UI32 e13, 
    UI32 e12, 
    UI32 e11, 
    UI32 e10, 
    UI32 e9, 
    UI32 e8, 
    UI32 e7, 
    UI32 e6, 
    UI32 e5, 
    UI32 e4, 
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m512i _mm512_setr_epi32(int e15, int e14, int e13,
                              int e12, int e11, int e10, int e9,
                              int e8, int e7, int e6, int e5,
                              int e4, int e3, int e2, int e1,
                              int e0)

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e15
        dst[63:32] := e14
        dst[95:64] := e13
        dst[127:96] := e12
        dst[159:128] := e11
        dst[191:160] := e10
        dst[223:192] := e9
        dst[255:224] := e8
        dst[287:256] := e7
        dst[319:288] := e6
        dst[351:320] := e5
        dst[383:352] := e4
        dst[415:384] := e3
        dst[447:416] := e2
        dst[479:448] := e1
        dst[511:480] := e0
        dst[MAX:512] := 0
        	

_mm512_setr_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __int64 e7, 
    __int64 e6, 
    __int64 e5, 
    __int64 e4, 
    __int64 e3, 
    __int64 e2, 
    __int64 e1, 
    __int64 e0
:Param ETypes:
    UI64 e7, 
    UI64 e6, 
    UI64 e5, 
    UI64 e4, 
    UI64 e3, 
    UI64 e2, 
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m512i _mm512_setr_epi64(__int64 e7, __int64 e6,
                              __int64 e5, __int64 e4,
                              __int64 e3, __int64 e2,
                              __int64 e1, __int64 e0)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e7
        dst[127:64] := e6
        dst[191:128] := e5
        dst[255:192] := e4
        dst[319:256] := e3
        dst[383:320] := e2
        dst[447:384] := e1
        dst[511:448] := e0
        dst[MAX:512] := 0
        	

_mm512_setr_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    double e7, 
    double e6, 
    double e5, 
    double e4, 
    double e3, 
    double e2, 
    double e1, 
    double e0
:Param ETypes:
    FP64 e7, 
    FP64 e6, 
    FP64 e5, 
    FP64 e4, 
    FP64 e3, 
    FP64 e2, 
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m512d _mm512_setr_pd(double e7, double e6, double e5,
                           double e4, double e3, double e2,
                           double e1, double e0)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e7
        dst[127:64] := e6
        dst[191:128] := e5
        dst[255:192] := e4
        dst[319:256] := e3
        dst[383:320] := e2
        dst[447:384] := e1
        dst[511:448] := e0
        dst[MAX:512] := 0
        	

_mm512_setr_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    float e15, 
    float e14, 
    float e13, 
    float e12, 
    float e11, 
    float e10, 
    float e9, 
    float e8, 
    float e7, 
    float e6, 
    float e5, 
    float e4, 
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e15, 
    FP32 e14, 
    FP32 e13, 
    FP32 e12, 
    FP32 e11, 
    FP32 e10, 
    FP32 e9, 
    FP32 e8, 
    FP32 e7, 
    FP32 e6, 
    FP32 e5, 
    FP32 e4, 
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m512 _mm512_setr_ps(float e15, float e14, float e13,
                          float e12, float e11, float e10,
                          float e9, float e8, float e7,
                          float e6, float e5, float e4,
                          float e3, float e2, float e1,
                          float e0)

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e15
        dst[63:32] := e14
        dst[95:64] := e13
        dst[127:96] := e12
        dst[159:128] := e11
        dst[191:160] := e10
        dst[223:192] := e9
        dst[255:224] := e8
        dst[287:256] := e7
        dst[319:288] := e6
        dst[351:320] := e5
        dst[383:352] := e4
        dst[415:384] := e3
        dst[447:416] := e2
        dst[479:448] := e1
        dst[511:480] := e0
        dst[MAX:512] := 0
        	

_mm512_setzero
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512

.. code-block:: C

    __m512 _mm512_setzero(void );

.. admonition:: Intel Description

    Return vector of type __m512 with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm512_setzero_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512i with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm512_setzero_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512d with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm512_setzero_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512 with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm512_setzero_si512
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512i with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm512_set_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    _Float16 e31, 
    _Float16 e30, 
    _Float16 e29, 
    _Float16 e28, 
    _Float16 e27, 
    _Float16 e26, 
    _Float16 e25, 
    _Float16 e24, 
    _Float16 e23, 
    _Float16 e22, 
    _Float16 e21, 
    _Float16 e20, 
    _Float16 e19, 
    _Float16 e18, 
    _Float16 e17, 
    _Float16 e16, 
    _Float16 e15, 
    _Float16 e14, 
    _Float16 e13, 
    _Float16 e12, 
    _Float16 e11, 
    _Float16 e10, 
    _Float16 e9, 
    _Float16 e8, 
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e31, 
    FP16 e30, 
    FP16 e29, 
    FP16 e28, 
    FP16 e27, 
    FP16 e26, 
    FP16 e25, 
    FP16 e24, 
    FP16 e23, 
    FP16 e22, 
    FP16 e21, 
    FP16 e20, 
    FP16 e19, 
    FP16 e18, 
    FP16 e17, 
    FP16 e16, 
    FP16 e15, 
    FP16 e14, 
    FP16 e13, 
    FP16 e12, 
    FP16 e11, 
    FP16 e10, 
    FP16 e9, 
    FP16 e8, 
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m512h _mm512_set_ph(
        _Float16 e31, _Float16 e30, _Float16 e29, _Float16 e28,
        _Float16 e27, _Float16 e26, _Float16 e25, _Float16 e24,
        _Float16 e23, _Float16 e22, _Float16 e21, _Float16 e20,
        _Float16 e19, _Float16 e18, _Float16 e17, _Float16 e16,
        _Float16 e15, _Float16 e14, _Float16 e13, _Float16 e12,
        _Float16 e11, _Float16 e10, _Float16 e9, _Float16 e8,
        _Float16 e7, _Float16 e6, _Float16 e5, _Float16 e4,
        _Float16 e3, _Float16 e2, _Float16 e1, _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e0
        dst.fp16[1] := e1
        dst.fp16[2] := e2
        dst.fp16[3] := e3
        dst.fp16[4] := e4
        dst.fp16[5] := e5
        dst.fp16[6] := e6
        dst.fp16[7] := e7
        dst.fp16[8] := e8
        dst.fp16[9] := e9
        dst.fp16[10] := e10
        dst.fp16[11] := e11
        dst.fp16[12] := e12
        dst.fp16[13] := e13
        dst.fp16[14] := e14
        dst.fp16[15] := e15
        dst.fp16[16] := e16
        dst.fp16[17] := e17
        dst.fp16[18] := e18
        dst.fp16[19] := e19
        dst.fp16[20] := e20
        dst.fp16[21] := e21
        dst.fp16[22] := e22
        dst.fp16[23] := e23
        dst.fp16[24] := e24
        dst.fp16[25] := e25
        dst.fp16[26] := e26
        dst.fp16[27] := e27
        dst.fp16[28] := e28
        dst.fp16[29] := e29
        dst.fp16[30] := e30
        dst.fp16[31] := e31
        	

_mm512_setr_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    _Float16 e31, 
    _Float16 e30, 
    _Float16 e29, 
    _Float16 e28, 
    _Float16 e27, 
    _Float16 e26, 
    _Float16 e25, 
    _Float16 e24, 
    _Float16 e23, 
    _Float16 e22, 
    _Float16 e21, 
    _Float16 e20, 
    _Float16 e19, 
    _Float16 e18, 
    _Float16 e17, 
    _Float16 e16, 
    _Float16 e15, 
    _Float16 e14, 
    _Float16 e13, 
    _Float16 e12, 
    _Float16 e11, 
    _Float16 e10, 
    _Float16 e9, 
    _Float16 e8, 
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e31, 
    FP16 e30, 
    FP16 e29, 
    FP16 e28, 
    FP16 e27, 
    FP16 e26, 
    FP16 e25, 
    FP16 e24, 
    FP16 e23, 
    FP16 e22, 
    FP16 e21, 
    FP16 e20, 
    FP16 e19, 
    FP16 e18, 
    FP16 e17, 
    FP16 e16, 
    FP16 e15, 
    FP16 e14, 
    FP16 e13, 
    FP16 e12, 
    FP16 e11, 
    FP16 e10, 
    FP16 e9, 
    FP16 e8, 
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m512h _mm512_setr_ph(
        _Float16 e31, _Float16 e30, _Float16 e29, _Float16 e28,
        _Float16 e27, _Float16 e26, _Float16 e25, _Float16 e24,
        _Float16 e23, _Float16 e22, _Float16 e21, _Float16 e20,
        _Float16 e19, _Float16 e18, _Float16 e17, _Float16 e16,
        _Float16 e15, _Float16 e14, _Float16 e13, _Float16 e12,
        _Float16 e11, _Float16 e10, _Float16 e9, _Float16 e8,
        _Float16 e7, _Float16 e6, _Float16 e5, _Float16 e4,
        _Float16 e3, _Float16 e2, _Float16 e1, _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e31
        dst.fp16[1] := e30
        dst.fp16[2] := e29
        dst.fp16[3] := e28
        dst.fp16[4] := e27
        dst.fp16[5] := e26
        dst.fp16[6] := e25
        dst.fp16[7] := e24
        dst.fp16[8] := e23
        dst.fp16[9] := e22
        dst.fp16[10] := e21
        dst.fp16[11] := e20
        dst.fp16[12] := e19
        dst.fp16[13] := e18
        dst.fp16[14] := e17
        dst.fp16[15] := e16
        dst.fp16[16] := e15
        dst.fp16[17] := e14
        dst.fp16[18] := e13
        dst.fp16[19] := e12
        dst.fp16[20] := e11
        dst.fp16[21] := e10
        dst.fp16[22] := e9
        dst.fp16[23] := e8
        dst.fp16[24] := e7
        dst.fp16[25] := e6
        dst.fp16[26] := e5
        dst.fp16[27] := e4
        dst.fp16[28] := e3
        dst.fp16[29] := e2
        dst.fp16[30] := e1
        dst.fp16[31] := e0
        	

_mm512_set1_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    _Float16 a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_set1_ph(_Float16 a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 31
        	dst.fp16[i] := a[15:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_set1_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    _Float16 _Complex a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_set1_pch(_Float16 _Complex a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[2*i+0] := a[15:0]
        	dst.fp16[2*i+1] := a[31:16]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_setzero_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h

.. code-block:: C

    

.. admonition:: Intel Description

    Return vector of type __m512h with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

YMM
~~~
_mm256_mask_set1_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    char a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_set1_epi8(__m256i src, __mmask32 k,
                                  char a)

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_set1_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    char a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_set1_epi8(__mmask32 k, char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_set1_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    short a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_set1_epi16(__m256i src, __mmask16 k,
                                   short a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_set1_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    short a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_set1_epi16(__mmask16 k, short a);

.. admonition:: Intel Description

    Broadcast 16-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_set1_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    int a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_set1_epi32(__m256i src, __mmask8 k,
                                   int a)

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_set1_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    int a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_set1_epi32(__mmask8 k, int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_set1_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __int64 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_set1_epi64(__m256i src, __mmask8 k,
                                   __int64 a)

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_set1_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __int64 a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_set1_epi64(__mmask8 k, __int64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_setzero_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256h

.. code-block:: C

    __m256h _mm256_setzero_ph(void );

.. admonition:: Intel Description

    Return vector of type __m256h with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm256_set_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    _Float16 e15, 
    _Float16 e14, 
    _Float16 e13, 
    _Float16 e12, 
    _Float16 e11, 
    _Float16 e10, 
    _Float16 e9, 
    _Float16 e8, 
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e15, 
    FP16 e14, 
    FP16 e13, 
    FP16 e12, 
    FP16 e11, 
    FP16 e10, 
    FP16 e9, 
    FP16 e8, 
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m256h _mm256_set_ph(_Float16 e15, _Float16 e14,
                          _Float16 e13, _Float16 e12,
                          _Float16 e11, _Float16 e10,
                          _Float16 e9, _Float16 e8, _Float16 e7,
                          _Float16 e6, _Float16 e5, _Float16 e4,
                          _Float16 e3, _Float16 e2, _Float16 e1,
                          _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e0
        dst.fp16[1] := e1
        dst.fp16[2] := e2
        dst.fp16[3] := e3
        dst.fp16[4] := e4
        dst.fp16[5] := e5
        dst.fp16[6] := e6
        dst.fp16[7] := e7
        dst.fp16[8] := e8
        dst.fp16[9] := e9
        dst.fp16[10] := e10
        dst.fp16[11] := e11
        dst.fp16[12] := e12
        dst.fp16[13] := e13
        dst.fp16[14] := e14
        dst.fp16[15] := e15
        	

_mm256_setr_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    _Float16 e15, 
    _Float16 e14, 
    _Float16 e13, 
    _Float16 e12, 
    _Float16 e11, 
    _Float16 e10, 
    _Float16 e9, 
    _Float16 e8, 
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e15, 
    FP16 e14, 
    FP16 e13, 
    FP16 e12, 
    FP16 e11, 
    FP16 e10, 
    FP16 e9, 
    FP16 e8, 
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m256h _mm256_setr_ph(
        _Float16 e15, _Float16 e14, _Float16 e13, _Float16 e12,
        _Float16 e11, _Float16 e10, _Float16 e9, _Float16 e8,
        _Float16 e7, _Float16 e6, _Float16 e5, _Float16 e4,
        _Float16 e3, _Float16 e2, _Float16 e1, _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e15
        dst.fp16[1] := e14
        dst.fp16[2] := e13
        dst.fp16[3] := e12
        dst.fp16[4] := e11
        dst.fp16[5] := e10
        dst.fp16[6] := e9
        dst.fp16[7] := e8
        dst.fp16[8] := e7
        dst.fp16[9] := e6
        dst.fp16[10] := e5
        dst.fp16[11] := e4
        dst.fp16[12] := e3
        dst.fp16[13] := e2
        dst.fp16[14] := e1
        dst.fp16[15] := e0
        	

_mm256_set1_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    _Float16 a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_set1_ph(_Float16 a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 15
        	dst.fp16[i] := a[15:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_pch
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    _Float16 _Complex a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_set1_pch(_Float16 _Complex a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[2*i+0] := a[15:0]
        	dst.fp16[2*i+1] := a[31:16]
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_mask_set1_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    char a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_set1_epi8(__m128i src, __mmask16 k,
                               char a)

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_set1_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    char a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_set1_epi8(__mmask16 k, char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_set1_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    short a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_set1_epi16(__m128i src, __mmask8 k,
                                short a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_set1_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    short a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_set1_epi16(__mmask8 k, short a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_set1_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    int a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_set1_epi32(__m128i src, __mmask8 k, int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_set1_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    int a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_set1_epi32(__mmask8 k, int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_set1_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __int64 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_set1_epi64(__m128i src, __mmask8 k,
                                __int64 a)

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_set1_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __int64 a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_set1_epi64(__mmask8 k, __int64 a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_setzero_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h

.. code-block:: C

    __m128h _mm_setzero_ph(void );

.. admonition:: Intel Description

    Return vector of type __m128h with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm_set_ph
^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m128h _mm_set_ph(_Float16 e7, _Float16 e6, _Float16 e5,
                       _Float16 e4, _Float16 e3, _Float16 e2,
                       _Float16 e1, _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e0
        dst.fp16[1] := e1
        dst.fp16[2] := e2
        dst.fp16[3] := e3
        dst.fp16[4] := e4
        dst.fp16[5] := e5
        dst.fp16[6] := e6
        dst.fp16[7] := e7
        	

_mm_setr_ph
^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    _Float16 e7, 
    _Float16 e6, 
    _Float16 e5, 
    _Float16 e4, 
    _Float16 e3, 
    _Float16 e2, 
    _Float16 e1, 
    _Float16 e0
:Param ETypes:
    FP16 e7, 
    FP16 e6, 
    FP16 e5, 
    FP16 e4, 
    FP16 e3, 
    FP16 e2, 
    FP16 e1, 
    FP16 e0

.. code-block:: C

    __m128h _mm_setr_ph(_Float16 e7, _Float16 e6, _Float16 e5,
                        _Float16 e4, _Float16 e3, _Float16 e2,
                        _Float16 e1, _Float16 e0)

.. admonition:: Intel Description

    Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := e7
        dst.fp16[1] := e6
        dst.fp16[2] := e5
        dst.fp16[3] := e4
        dst.fp16[4] := e3
        dst.fp16[5] := e2
        dst.fp16[6] := e1
        dst.fp16[7] := e0
        	

_mm_set1_ph
^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    _Float16 a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_set1_ph(_Float16 a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 7
        	dst.fp16[i] := a[15:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_set1_pch
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    _Float16 _Complex a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_set1_pch(_Float16 _Complex a);

.. admonition:: Intel Description

    Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	dst.fp16[2*i+0] := a[15:0]
        	dst.fp16[2*i+1] := a[31:16]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_set_sh
^^^^^^^^^^
:Tech: AVX-512
:Category: Set
:Header: immintrin.h
:Searchable: AVX-512-Set-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    _Float16 a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_set_sh(_Float16 a);

.. admonition:: Intel Description

    Copy half-precision (16-bit) floating-point element "a" to the lower element of "dst", and zero the upper 7 elements.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a[15:0]
        dst[127:16] := 0
        	

Convert
-------
ZMM
~~~
_mm512_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m256i _mm512_cvtsepi16_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Saturate8(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm512_mask_cvtsepi16_epi8(__m256i src, __mmask32 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm512_maskz_cvtsepi16_epi8(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m512i _mm512_cvtepi8_epi16(__m256i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	dst[l+15:l] := SignExtend16(a[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi8_epi16(__m512i src, __mmask32 k,
                                      __m256i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi8_epi16(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm512_cvtusepi16_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := SaturateU8(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm512_mask_cvtusepi16_epi8(__m256i src,
                                        __mmask32 k, __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm512_maskz_cvtusepi16_epi8(__mmask32 k,
                                         __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm512_cvtepi16_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Truncate8(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm512_mask_cvtepi16_epi8(__m256i src, __mmask32 k,
                                      __m512i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm512_maskz_cvtepi16_epi8(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_cvtepu8_epi16(__m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	dst[l+15:l] := ZeroExtend16(a[i+7:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu8_epi16(__m512i src, __mmask32 k,
                                      __m256i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu8_epi16(__mmask32 k, __m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundpd_epi64(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtpd_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512i _mm512_cvtpd_epi64(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundpd_epi64(__m512i src,
                                          __mmask8 k, __m512d a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_mask_cvtpd_epi64(__m512i src, __mmask8 k,
                                    __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundpd_epi64(__mmask8 k,
                                           __m512d a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_maskz_cvtpd_epi64(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundpd_epu64(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtpd_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512i _mm512_cvtpd_epu64(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundpd_epu64(__m512i src,
                                          __mmask8 k, __m512d a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_mask_cvtpd_epu64(__m512i src, __mmask8 k,
                                    __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundpd_epu64(__mmask8 k,
                                           __m512d a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_maskz_cvtpd_epu64(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundps_epi64(__m256 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtps_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvtps_epi64(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundps_epi64(__m512i src,
                                          __mmask8 k, __m256 a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	 [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvtps_epi64(__m512i src, __mmask8 k,
                                    __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundps_epi64(__mmask8 k, __m256 a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtps_epi64(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundps_epu64(__m256 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtps_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvtps_epu64(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundps_epu64(__m512i src,
                                          __mmask8 k, __m256 a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvtps_epu64(__m512i src, __mmask8 k,
                                    __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundps_epu64(__mmask8 k, __m256 a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtps_epu64(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_cvt_roundepi64_pd(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi64_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m512d _mm512_cvtepi64_pd(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_cvt_roundepi64_pd(__m512d src,
                                          __mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m512d _mm512_mask_cvtepi64_pd(__m512d src, __mmask8 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_cvt_roundepi64_pd(__mmask8 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m512d _mm512_maskz_cvtepi64_pd(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_cvt_roundepi64_ps(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi64_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m256 _mm512_cvtepi64_ps(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_mask_cvt_roundepi64_ps(__m256 src, __mmask8 k,
                                         __m512i a,
                                         int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m256 _mm512_mask_cvtepi64_ps(__m256 src, __mmask8 k,
                                   __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_maskz_cvt_roundepi64_ps(__mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m256 _mm512_maskz_cvtepi64_ps(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundpd_epi64(__m512d a, int sae);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512i _mm512_cvttpd_epi64(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundpd_epi64(__m512i src,
                                           __mmask8 k,
                                           __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_mask_cvttpd_epi64(__m512i src, __mmask8 k,
                                     __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundpd_epi64(__mmask8 k,
                                            __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_maskz_cvttpd_epi64(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundpd_epu64(__m512d a, int sae);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512i _mm512_cvttpd_epu64(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundpd_epu64(__m512i src,
                                           __mmask8 k,
                                           __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_mask_cvttpd_epu64(__m512i src, __mmask8 k,
                                     __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundpd_epu64(__mmask8 k,
                                            __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512i _mm512_maskz_cvttpd_epu64(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundps_epi64(__m256 a, int sae);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttps_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvttps_epi64(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundps_epi64(__m512i src,
                                           __mmask8 k, __m256 a,
                                           int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvttps_epi64(__m512i src, __mmask8 k,
                                     __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundps_epi64(__mmask8 k,
                                            __m256 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvttps_epi64(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundps_epu64(__m256 a, int sae);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttps_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvttps_epu64(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundps_epu64(__m512i src,
                                           __mmask8 k, __m256 a,
                                           int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvttps_epu64(__m512i src, __mmask8 k,
                                     __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundps_epu64(__mmask8 k,
                                            __m256 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvttps_epu64(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_cvt_roundepu64_pd(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu64_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512d _mm512_cvtepu64_pd(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_cvt_roundepu64_pd(__m512d src,
                                          __mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512d _mm512_mask_cvtepu64_pd(__m512d src, __mmask8 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_cvt_roundepu64_pd(__mmask8 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512d _mm512_maskz_cvtepu64_pd(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_cvt_roundepu64_ps(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepu64_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256 _mm512_cvtepu64_ps(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_mask_cvt_roundepu64_ps(__m256 src, __mmask8 k,
                                         __m512i a,
                                         int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256 _mm512_mask_cvtepu64_ps(__m256 src, __mmask8 k,
                                   __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_maskz_cvt_roundepu64_ps(__mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256 _mm512_maskz_cvtepu64_ps(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi32_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m512d _mm512_cvtepi32_pd(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m512d _mm512_mask_cvtepi32_pd(__m512d src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := src[m+63:m]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m512d _mm512_maskz_cvtepi32_pd(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_cvt_roundepi32_ps(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi32_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m512 _mm512_cvtepi32_ps(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_cvt_roundepi32_ps(__m512 src,
                                         __mmask16 k, __m512i a,
                                         int rounding)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m512 _mm512_mask_cvtepi32_ps(__m512 src, __mmask16 k,
                                   __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_cvt_roundepi32_ps(__mmask16 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m512 _mm512_maskz_cvtepi32_ps(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_cvt_roundpd_epi32(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtpd_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm512_cvtpd_epi32(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_mask_cvt_roundpd_epi32(__m256i src,
                                          __mmask8 k, __m512d a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_mask_cvtpd_epi32(__m256i src, __mmask8 k,
                                    __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_maskz_cvt_roundpd_epi32(__mmask8 k,
                                           __m512d a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_maskz_cvtpd_epi32(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundpd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_cvt_roundpd_ps(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtpd_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256 _mm512_cvtpd_ps(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundpd_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_mask_cvt_roundpd_ps(__m256 src, __mmask8 k,
                                      __m512d a, int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtpd_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256 _mm512_mask_cvtpd_ps(__m256 src, __mmask8 k,
                                __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundpd_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm512_maskz_cvt_roundpd_ps(__mmask8 k, __m512d a,
                                       int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtpd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256 _mm512_maskz_cvtpd_ps(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_cvt_roundpd_epu32(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtpd_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm512_cvtpd_epu32(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_mask_cvt_roundpd_epu32(__m256i src,
                                          __mmask8 k, __m512d a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_mask_cvtpd_epu32(__m256i src, __mmask8 k,
                                    __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_maskz_cvt_roundpd_epu32(__mmask8 k,
                                           __m512d a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_maskz_cvtpd_epu32(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundph_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256i a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_cvt_roundph_ps(__m256i a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256i a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512 _mm512_cvtph_ps(__m256i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256i a, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_cvt_roundph_ps(__m512 src, __mmask16 k,
                                      __m256i a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512 _mm512_mask_cvtph_ps(__m512 src, __mmask16 k,
                                __m256i a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256i a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_cvt_roundph_ps(__mmask16 k, __m256i a,
                                       int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512 _mm512_maskz_cvtph_ps(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundps_epi32(__m512 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtps_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvtps_epi32(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundps_epi32(__m512i src,
                                          __mmask16 k, __m512 a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvtps_epi32(__m512i src, __mmask16 k,
                                    __m512 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundps_epi32(__mmask16 k,
                                           __m512 a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtps_epi32(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundps_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_cvt_roundps_pd(__m256 a, int sae);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtps_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512d _mm512_cvtps_pd(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundps_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_cvt_roundps_pd(__m512d src, __mmask8 k,
                                       __m256 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtps_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512d _mm512_mask_cvtps_pd(__m512d src, __mmask8 k,
                                 __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundps_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_cvt_roundps_pd(__mmask8 k, __m256 a,
                                        int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtps_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512d _mm512_maskz_cvtps_pd(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_cvt_roundps_ph(__m512 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtps_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_cvtps_ph(__m512 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_mask_cvt_roundps_ph(__m256i src, __mmask16 k,
                                       __m512 a, int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtps_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_mask_cvtps_ph(__m256i src, __mmask16 k,
                                 __m512 a, int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_maskz_cvt_roundps_ph(__mmask16 k, __m512 a,
                                        int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtps_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256i _mm512_maskz_cvtps_ph(__mmask16 k, __m512 a,
                                  int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round2_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundps_epu32(__m512 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtps_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvtps_epu32(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundps_epu32(__m512i src,
                                          __mmask16 k, __m512 a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvtps_epu32(__m512i src, __mmask16 k,
                                    __m512 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundps_epu32(__mmask16 k,
                                           __m512 a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtps_epu32(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_cvtt_roundpd_epi32(__m512d a, int sae);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm512_cvttpd_epi32(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_mask_cvtt_roundpd_epi32(__m256i src,
                                           __mmask8 k,
                                           __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_mask_cvttpd_epi32(__m256i src, __mmask8 k,
                                     __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtt_roundpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_maskz_cvtt_roundpd_epi32(__mmask8 k,
                                            __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_maskz_cvttpd_epi32(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_cvtt_roundpd_epu32(__m512d a, int sae);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm512_cvttpd_epu32(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_mask_cvtt_roundpd_epu32(__m256i src,
                                           __mmask8 k,
                                           __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_mask_cvttpd_epu32(__m256i src, __mmask8 k,
                                     __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtt_roundpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m256i _mm512_maskz_cvtt_roundpd_epu32(__mmask8 k,
                                            __m512d a, int sae)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm512_maskz_cvttpd_epu32(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundps_epi32(__m512 a, int sae);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttps_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvttps_epi32(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundps_epi32(__m512i src,
                                           __mmask16 k,
                                           __m512 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvttps_epi32(__m512i src, __mmask16 k,
                                     __m512 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundps_epi32(__mmask16 k,
                                            __m512 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvttps_epi32(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundps_epu32(__m512 a, int sae);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttps_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512i _mm512_cvttps_epu32(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundps_epu32(__m512i src,
                                           __mmask16 k,
                                           __m512 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).   [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_mask_cvttps_epu32(__m512i src, __mmask16 k,
                                     __m512 a)

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundps_epu32(__mmask16 k,
                                            __m512 a, int sae)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512i _mm512_maskz_cvttps_epu32(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu32_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512d _mm512_cvtepu32_pd(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512d _mm512_mask_cvtepu32_pd(__m512d src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512d _mm512_maskz_cvtepu32_pd(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepu32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_cvt_roundepu32_ps(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu32_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512 _mm512_cvtepu32_ps(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepu32_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_cvt_roundepu32_ps(__m512 src,
                                         __mmask16 k, __m512i a,
                                         int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu32_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512 _mm512_mask_cvtepu32_ps(__m512 src, __mmask16 k,
                                   __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepu32_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_cvt_roundepu32_ps(__mmask16 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512 _mm512_maskz_cvtepu32_ps(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm512_cvtepi32_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm512_mask_cvtepi32_epi8(__m128i src, __mmask16 k,
                                      __m512i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm512_maskz_cvtepi32_epi8(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm512_cvtepi32_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm512_mask_cvtepi32_epi16(__m256i src, __mmask16 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm512_maskz_cvtepi32_epi16(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm512_cvtepi64_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtepi64_epi8(__m128i src, __mmask8 k,
                                      __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtepi64_epi8(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm512_cvtepi64_epi32(__m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Truncate32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm512_mask_cvtepi64_epi32(__m256i src, __mmask8 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm512_maskz_cvtepi64_epi32(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm512_cvtepi64_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtepi64_epi16(__m128i src, __mmask8 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtepi64_epi16(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm512_cvtsepi32_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm512_mask_cvtsepi32_epi8(__m128i src, __mmask16 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm512_maskz_cvtsepi32_epi8(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256i _mm512_cvtsepi32_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm512_mask_cvtsepi32_epi16(__m256i src,
                                        __mmask16 k, __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm512_maskz_cvtsepi32_epi16(__mmask16 k,
                                         __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm512_cvtsepi64_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_mask_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtsepi64_epi8(__m128i src, __mmask8 k,
                                       __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_maskz_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtsepi64_epi8(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m256i _mm512_cvtsepi64_epi32(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Saturate32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m256i _mm512_mask_cvtsepi64_epi32(__m256i src, __mmask8 k,
                                        __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m256i _mm512_maskz_cvtsepi64_epi32(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm512_cvtsepi64_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtsepi64_epi16(__m128i src, __mmask8 k,
                                        __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtsepi64_epi16(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m512i _mm512_cvtepi8_epi32(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := SignExtend32(a[k+7:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi8_epi32(__m512i src, __mmask16 k,
                                      __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi8_epi32(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m512i _mm512_cvtepi8_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := SignExtend64(a[k+7:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi8_epi64(__m512i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m512i _mm512_cvtepi32_epi64(__m256i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := SignExtend64(a[k+31:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi32_epi64(__m512i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi32_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m512i _mm512_cvtepi16_epi32(__m256i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := SignExtend32(a[k+15:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi16_epi32(__m512i src, __mmask16 k,
                                       __m256i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	l := j*16
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi16_epi32(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m512i _mm512_cvtepi16_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := SignExtend64(a[k+15:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_mask_cvtepi16_epi64(__m512i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepi16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm512_cvtusepi32_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm512_mask_cvtusepi32_epi8(__m128i src,
                                        __mmask16 k, __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm512_maskz_cvtusepi32_epi8(__mmask16 k,
                                         __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm512_cvtusepi32_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm512_mask_cvtusepi32_epi16(__m256i src,
                                         __mmask16 k,
                                         __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm512_maskz_cvtusepi32_epi16(__mmask16 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm512_cvtusepi64_epi8(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_mask_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtusepi64_epi8(__m128i src, __mmask8 k,
                                        __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_maskz_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtusepi64_epi8(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm512_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm512_cvtusepi64_epi32(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := SaturateU32(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm512_mask_cvtusepi64_epi32(__m256i src,
                                         __mmask8 k, __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm512_maskz_cvtusepi64_epi32(__mmask8 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm512_cvtusepi64_epi16(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_mask_cvtusepi64_epi16(__m128i src,
                                         __mmask8 k, __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm512_maskz_cvtusepi64_epi16(__mmask8 k,
                                          __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_cvtepu8_epi32(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := ZeroExtend32(a[k+7:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu8_epi32(__m512i src, __mmask16 k,
                                      __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu8_epi32(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_cvtepu8_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := ZeroExtend64(a[k+7:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu8_epi64(__m512i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_cvtepu32_epi64(__m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := ZeroExtend64(a[k+31:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu32_epi64(__m512i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu32_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE 
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512i _mm512_cvtepu16_epi32(__m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := ZeroExtend32(a[k+15:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu16_epi32(__m512i src, __mmask16 k,
                                       __m256i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu16_epi32(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512i _mm512_cvtepu16_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := ZeroExtend64(a[k+15:k])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_cvtepu16_epi64(__m512i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtepu16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtss_f32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: float
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm512_cvtss_f32(__m512 a);

.. admonition:: Intel Description

    Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm512_cvtsd_f64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: double
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm512_cvtsd_f64(__m512d a);

.. admonition:: Intel Description

    Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm512_cvtsi512_si32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: int
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm512_cvtsi512_si32(__m512i a);

.. admonition:: Intel Description

    Copy the lower 32-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm512_cvtpslo_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512 v2
:Param ETypes:
    FP32 v2

.. code-block:: C

    __m512d _mm512_cvtpslo_pd(__m512 v2);

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := j*64
        	dst[n+63:n] := Convert_FP32_To_FP64(v2[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtpslo_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512 v2
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP32 v2

.. code-block:: C

    __m512d _mm512_mask_cvtpslo_pd(__m512d src, __mmask8 k,
                                   __m512 v2)

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[l+63:l] := Convert_FP32_To_FP64(v2[i+31:i])
        	ELSE
        		dst[l+63:l] := src[l+63:l]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi32lo_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i v2
:Param ETypes:
    SI32 v2

.. code-block:: C

    __m512d _mm512_cvtepi32lo_pd(__m512i v2);

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi32lo_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i v2
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 v2

.. code-block:: C

    __m512d _mm512_mask_cvtepi32lo_pd(__m512d src, __mmask8 k,
                                      __m512i v2)

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := j*64
        	IF k[j]
        		dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i])
        	ELSE
        		dst[n+63:n] := src[n+63:n]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu32lo_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512i v2
:Param ETypes:
    UI32 v2

.. code-block:: C

    __m512d _mm512_cvtepu32lo_pd(__m512i v2);

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of packed 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := j*64
        	dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu32lo_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512i v2
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI32 v2

.. code-block:: C

    __m512d _mm512_mask_cvtepu32lo_pd(__m512d src, __mmask8 k,
                                      __m512i v2)

.. admonition:: Intel Description

    Performs element-by-element conversion of the lower half of 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i])
        	ELSE
        		dst[l+63:l] := src[l+63:l]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtpd_pslo
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512d v2
:Param ETypes:
    FP64 v2

.. code-block:: C

    __m512 _mm512_cvtpd_pslo(__m512d v2);

.. admonition:: Intel Description

    Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst". The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	k := j*32
        	dst[k+31:k] := Convert_FP64_To_FP32(v2[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtpd_pslo
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask8 k, 
    __m512d v2
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP64 v2

.. code-block:: C

    __m512 _mm512_mask_cvtpd_pslo(__m512 src, __mmask8 k,
                                  __m512d v2)

.. admonition:: Intel Description

    Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_FP64_To_FP32(v2[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtpbh_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256bh a
:Param ETypes:
    BF16 a

.. code-block:: C

    __m512 _mm512_cvtpbh_ps(__m256bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtpbh_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256bh a
:Param ETypes:
    MASK k, 
    BF16 a

.. code-block:: C

    __m512 _mm512_maskz_cvtpbh_ps(__mmask16 k, __m256bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtpbh_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256bh a
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a

.. code-block:: C

    __m512 _mm512_mask_cvtpbh_ps(__m512 src, __mmask16 k,
                                 __m256bh a)

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512bh
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512bh _mm512_cvtne2ps_pbh(__m512 a, __m512 b);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF j < 16
        		t := b.fp32[j]
        	ELSE
        		t := a.fp32[j-16]
        	FI
        	dst.word[j] := Convert_FP32_To_BF16(t)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512bh
:Param Types:
    __m512bh src, 
    __mmask32 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512bh _mm512_mask_cvtne2ps_pbh(__m512bh src, __mmask32 k,
                                      __m512 a, __m512 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF j < 16
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-16]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512bh
:Param Types:
    __mmask32 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512bh _mm512_maskz_cvtne2ps_pbh(__mmask32 k, __m512 a,
                                       __m512 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		IF j < 16
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-16]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtneps_pbh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256bh
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256bh _mm512_cvtneps_pbh(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256bh
:Param Types:
    __m256bh src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256bh _mm512_mask_cvtneps_pbh(__m256bh src, __mmask16 k,
                                     __m512 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256bh
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256bh _mm512_maskz_cvtneps_pbh(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi16_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m512h _mm512_cvtepi16_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_cvt_roundepi16_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m512h _mm512_mask_cvtepi16_ph(__m512h src, __mmask32 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_cvt_roundepi16_ph(__m512h src,
                                          __mmask32 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m512h _mm512_maskz_cvtepi16_ph(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_cvt_roundepi16_ph(__mmask32 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepu16_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512h _mm512_cvtepu16_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_cvt_roundepu16_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512h _mm512_mask_cvtepu16_ph(__m512h src, __mmask32 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_cvt_roundepu16_ph(__m512h src,
                                          __mmask32 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512h _mm512_maskz_cvtepu16_ph(__mmask32 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_cvt_roundepu16_ph(__mmask32 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtepi32_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256h _mm512_cvtepi32_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_cvt_roundepi32_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256h _mm512_mask_cvtepi32_ph(__m256h src, __mmask16 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_mask_cvt_roundepi32_ph(__m256h src,
                                          __mmask16 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256h _mm512_maskz_cvtepi32_ph(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_maskz_cvt_roundepi32_ph(__mmask16 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepu32_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256h _mm512_cvtepu32_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvt_roundepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_cvt_roundepu32_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256h _mm512_mask_cvtepu32_ph(__m256h src, __mmask16 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvt_roundepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_mask_cvt_roundepu32_ph(__m256h src,
                                          __mmask16 k,
                                          __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256h _mm512_maskz_cvtepu32_ph(__mmask16 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvt_roundepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_maskz_cvt_roundepu32_ph(__mmask16 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtepi64_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128h _mm512_cvtepi64_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvt_roundepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_cvt_roundepi64_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm512_mask_cvtepi64_ph(__m128h src, __mmask8 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvt_roundepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_mask_cvt_roundepi64_ph(__m128h src,
                                          __mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm512_maskz_cvtepi64_ph(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvt_roundepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    SI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_maskz_cvt_roundepi64_ph(__mmask8 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtepu64_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128h _mm512_cvtepu64_ph(__m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvt_roundepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512i a, 
    int rounding
:Param ETypes:
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_cvt_roundepu64_ph(__m512i a, int rounding);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm512_mask_cvtepu64_ph(__m128h src, __mmask8 k,
                                    __m512i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvt_roundepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_mask_cvt_roundepu64_ph(__m128h src,
                                          __mmask8 k, __m512i a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm512_maskz_cvtepu64_ph(__mmask8 k, __m512i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvt_roundepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int rounding
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_maskz_cvt_roundepu64_ph(__mmask8 k,
                                           __m512i a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtpd_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128h _mm512_cvtpd_ph(__m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvt_roundpd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m512d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_cvt_roundpd_ph(__m512d a, int rounding);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvtpd_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm512_mask_cvtpd_ph(__m128h src, __mmask8 k,
                                 __m512d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_mask_cvt_roundpd_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_mask_cvt_roundpd_ph(__m128h src, __mmask8 k,
                                       __m512d a, int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvtpd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm512_maskz_cvtpd_ph(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_cvt_roundpd_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m128h _mm512_maskz_cvt_roundpd_ph(__mmask8 k, __m512d a,
                                        int rounding)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_cvtxps_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256h _mm512_cvtxps_ph(__m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtx_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m512 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_cvtx_roundps_ph(__m512 a, int rounding);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtxps_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256h _mm512_mask_cvtxps_ph(__m256h src, __mmask16 k,
                                  __m512 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_cvtx_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_mask_cvtx_roundps_ph(__m256h src,
                                        __mmask16 k, __m512 a,
                                        int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtxps_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256h _mm512_maskz_cvtxps_ph(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_cvtx_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256h _mm512_maskz_cvtx_roundps_ph(__mmask16 k, __m512 a,
                                         int rounding)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_cvtph_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epi32(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundph_epi32(__m256h a, int rounding);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epi32(__m512i src, __mmask16 k,
                                    __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epi32(__m512i src,
                                          __mmask16 k,
                                          __m256h a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epi32(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epi32(__mmask16 k,
                                           __m256h a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epi32(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epi32(__m256h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epi32(__m512i src, __mmask16 k,
                                     __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epi32(__m512i src,
                                           __mmask16 k,
                                           __m256h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epi32(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epi32(__mmask16 k,
                                            __m256h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epu32(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundph_epu32(__m256h a, int rounding);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epu32(__m512i src, __mmask16 k,
                                    __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a, 
    int rounding
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epu32(__m512i src,
                                          __mmask16 k,
                                          __m256h a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epu32(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epu32(__mmask16 k,
                                           __m256h a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epu32(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epu32(__m256h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epu32(__m512i src, __mmask16 k,
                                     __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epu32(__m512i src,
                                           __mmask16 k,
                                           __m256h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epu32(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epu32(__mmask16 k,
                                            __m256h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundph_epi64(__m128h a, int rounding);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epi64(__m512i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epi64(__m512i src,
                                          __mmask8 k, __m128h a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epi64(__mmask8 k,
                                           __m128h a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epi64(__m128h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epi64(__m512i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epi64(__m512i src,
                                           __mmask8 k,
                                           __m128h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epi64(__mmask8 k,
                                            __m128h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundph_epu64(__m128h a, int rounding);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epu64(__m512i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a, 
    int rounding
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epu64(__m512i src,
                                          __mmask8 k, __m128h a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epu64(__mmask8 k,
                                           __m128h a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epu64(__m128h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epu64(__m512i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epu64(__m512i src,
                                           __mmask8 k,
                                           __m128h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epu64(__mmask8 k,
                                            __m128h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epi16(__m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_cvt_roundph_epi16(__m512h a, int rounding);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epi16(__m512i src, __mmask32 k,
                                    __m512h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a, 
    int rounding
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epi16(__m512i src,
                                          __mmask32 k,
                                          __m512h a,
                                          int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epi16(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM rounding

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epi16(__mmask32 k,
                                           __m512h a,
                                           int rounding)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epi16(__m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epi16(__m512h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epi16(__m512i src, __mmask32 k,
                                     __m512h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epi16(__m512i src,
                                           __mmask32 k,
                                           __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epi16(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epi16(__mmask32 k,
                                            __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvtph_epu16(__m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvt_roundph_epu16(__m512h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvtph_epu16(__m512i src, __mmask32 k,
                                    __m512h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvt_roundph_epu16(__m512i src,
                                          __mmask32 k,
                                          __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvtph_epu16(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvt_roundph_epu16(__mmask32 k,
                                           __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvttph_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512i _mm512_cvttph_epu16(__m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_cvtt_roundph_epu16(__m512h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_mask_cvttph_epu16(__m512i src, __mmask32 k,
                                     __m512h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_mask_cvtt_roundph_epu16(__m512i src,
                                           __mmask32 k,
                                           __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512i _mm512_maskz_cvttph_epu16(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtt_roundph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512i _mm512_maskz_cvtt_roundph_epu16(__mmask32 k,
                                            __m512h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 31
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtph_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512d _mm512_cvtph_pd(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvt_roundph_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_cvt_roundph_pd(__m128h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtph_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512d _mm512_mask_cvtph_pd(__m512d src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := src.fp64[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvt_roundph_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_cvt_roundph_pd(__m512d src, __mmask8 k,
                                       __m128h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := src.fp64[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtph_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512d _mm512_maskz_cvtph_pd(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvt_roundph_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_cvt_roundph_pd(__mmask8 k, __m128h a,
                                        int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtxph_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512 _mm512_cvtxph_ps(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtx_roundph_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_cvtx_roundph_ps(__m256h a, int sae);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtxph_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512 _mm512_mask_cvtxph_ps(__m512 src, __mmask16 k,
                                 __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := src.fp32[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_cvtx_roundph_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_cvtx_roundph_ps(__m512 src, __mmask16 k,
                                       __m256h a, int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := src.fp32[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtxph_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512 _mm512_maskz_cvtxph_ps(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_cvtx_roundph_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_cvtx_roundph_ps(__mmask16 k, __m256h a,
                                        int sae)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_cvtsh_h
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-ZMM
:Register: ZMM 512 bit
:Return Type: _Float16
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm512_cvtsh_h(__m512h a);

.. admonition:: Intel Description

    Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a.fp16[0]
        	

YMM
~~~
_mm256_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128i _mm256_cvtsepi16_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Saturate8(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi16_epi8(__m128i src, __mmask16 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi16_epi8(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi8_epi16(__m256i src, __mmask16 k,
                                      __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi8_epi16(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm256_cvtusepi16_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := SaturateU8(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi16_epi8(__m128i src,
                                        __mmask16 k, __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi16_epi8(__mmask16 k,
                                         __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm256_cvtepi16_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Truncate8(a[i+15:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi16_epi8(__m128i src, __mmask16 k,
                                      __m256i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi16_epi8(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu8_epi16(__m256i src, __mmask16 k,
                                      __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu8_epi16(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtpd_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm256_cvtpd_epi64(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_mask_cvtpd_epi64(__m256i src, __mmask8 k,
                                    __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_maskz_cvtpd_epi64(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtpd_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm256_cvtpd_epu64(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_mask_cvtpd_epu64(__m256i src, __mmask8 k,
                                    __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_maskz_cvtpd_epu64(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtps_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvtps_epi64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvtps_epi64(__m256i src, __mmask8 k,
                                    __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtps_epi64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtps_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvtps_epu64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvtps_epu64(__m256i src, __mmask8 k,
                                    __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtps_epu64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi64_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m256d _mm256_cvtepi64_pd(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m256d _mm256_mask_cvtepi64_pd(__m256d src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m256d _mm256_maskz_cvtepi64_pd(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi64_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128 _mm256_cvtepi64_ps(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128 _mm256_mask_cvtepi64_ps(__m128 src, __mmask8 k,
                                   __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128 _mm256_maskz_cvtepi64_ps(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm256_cvttpd_epi64(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_mask_cvttpd_epi64(__m256i src, __mmask8 k,
                                     __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_maskz_cvttpd_epi64(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm256_cvttpd_epu64(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_mask_cvttpd_epu64(__m256i src, __mmask8 k,
                                     __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256i _mm256_maskz_cvttpd_epu64(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttps_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvttps_epi64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvttps_epi64(__m256i src, __mmask8 k,
                                     __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvttps_epi64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttps_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvttps_epu64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvttps_epu64(__m256i src, __mmask8 k,
                                     __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvttps_epu64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu64_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256d _mm256_cvtepu64_pd(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256d _mm256_mask_cvtepu64_pd(__m256d src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256d _mm256_maskz_cvtepu64_pd(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu64_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128 _mm256_cvtepu64_ps(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128 _mm256_mask_cvtepu64_ps(__m128 src, __mmask8 k,
                                   __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128 _mm256_maskz_cvtepu64_ps(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256d _mm256_mask_cvtepi32_pd(__m256d src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := src[m+63:m]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256d _mm256_maskz_cvtepi32_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256 _mm256_mask_cvtepi32_ps(__m256 src, __mmask8 k,
                                   __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256 _mm256_maskz_cvtepi32_ps(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_mask_cvtpd_epi32(__m128i src, __mmask8 k,
                                    __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtpd_epi32(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtpd_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128 _mm256_mask_cvtpd_ps(__m128 src, __mmask8 k,
                                __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtpd_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128 _mm256_maskz_cvtpd_ps(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtpd_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm256_cvtpd_epu32(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_mask_cvtpd_epu32(__m128i src, __mmask8 k,
                                    __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtpd_epu32(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtph_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256 _mm256_mask_cvtph_ps(__m256 src, __mmask8 k,
                                __m128i a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256 _mm256_maskz_cvtph_ps(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvtps_epi32(__m256i src, __mmask8 k,
                                    __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtps_epi32(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_mask_cvt_roundps_ph(__m128i src, __mmask8 k,
                                       __m256 a, int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtps_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_mask_cvtps_ph(__m128i src, __mmask8 k,
                                 __m256 a, int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_maskz_cvt_roundps_ph(__mmask8 k, __m256 a,                                    int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtps_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_maskz_cvtps_ph(__mmask8 k, __m256 a,
                                  int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtps_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvtps_epu32(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvtps_epu32(__m256i src, __mmask8 k,
                                    __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtps_epu32(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_mask_cvttpd_epi32(__m128i src, __mmask8 k,
                                     __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_maskz_cvttpd_epi32(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm256_cvttpd_epu32(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_mask_cvttpd_epu32(__m128i src, __mmask8 k,
                                     __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm256_maskz_cvttpd_epu32(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvttps_epi32(__m256i src, __mmask8 k,
                                     __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvttps_epi32(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttps_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvttps_epu32(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_mask_cvttps_epu32(__m256i src, __mmask8 k,
                                     __m256 a)

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256i _mm256_maskz_cvttps_epu32(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu32_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256d _mm256_cvtepu32_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256d _mm256_mask_cvtepu32_pd(__m256d src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256d _mm256_maskz_cvtepu32_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm256_cvtepi32_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi32_epi8(__m128i src, __mmask8 k,
                                      __m256i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi32_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm256_cvtepi32_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi32_epi16(__m128i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi32_epi16(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtepi64_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi64_epi8(__m128i src, __mmask8 k,
                                      __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi64_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtepi64_epi32(__m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Truncate32(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi64_epi32(__m128i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi64_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtepi64_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtepi64_epi16(__m128i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtepi64_epi16(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm256_cvtsepi32_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi32_epi8(__m128i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi32_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm256_cvtsepi32_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi32_epi16(__m128i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi32_epi16(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm256_cvtsepi64_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+63:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_mask_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi64_epi8(__m128i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_maskz_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi64_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm256_cvtsepi64_epi32(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Saturate32(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi64_epi32(__m128i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi64_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm256_cvtsepi64_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtsepi64_epi16(__m128i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtsepi64_epi16(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi8_epi32(__m256i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi8_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi8_epi64(__m256i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi32_epi64(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi32_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi16_epi32(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	l := j*16
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi16_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_mask_cvtepi16_epi64(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepi16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm256_cvtusepi32_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi32_epi8(__m128i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi32_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm256_cvtusepi32_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi32_epi16(__m128i src,
                                         __mmask8 k, __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi32_epi16(__mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtusepi64_epi8(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+63:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_mask_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi64_epi8(__m128i src, __mmask8 k,
                                        __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_maskz_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi64_epi8(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm256_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtusepi64_epi32(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := SaturateU32(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi64_epi32(__m128i src,
                                         __mmask8 k, __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi64_epi32(__mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm256_cvtusepi64_epi16(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_mask_cvtusepi64_epi16(__m128i src,
                                         __mmask8 k, __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm256_maskz_cvtusepi64_epi16(__mmask8 k,
                                          __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu8_epi32(__m256i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu8_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu8_epi64(__m256i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu32_epi64(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu32_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE 
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu16_epi32(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu16_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_cvtepu16_epi64(__m256i src, __mmask8 k,
                                       __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtepu16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtpbh_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128bh a
:Param ETypes:
    BF16 a

.. code-block:: C

    __m256 _mm256_cvtpbh_ps(__m128bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtpbh_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128bh a
:Param ETypes:
    MASK k, 
    BF16 a

.. code-block:: C

    __m256 _mm256_maskz_cvtpbh_ps(__mmask8 k, __m128bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtpbh_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128bh a
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a

.. code-block:: C

    __m256 _mm256_mask_cvtpbh_ps(__m256 src, __mmask8 k,
                                 __m128bh a)

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256bh
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256bh _mm256_cvtne2ps_pbh(__m256 a, __m256 b);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF j < 8
        		t := b.fp32[j]
        	ELSE
        		t := a.fp32[j-8]
        	FI
        	dst.word[j] := Convert_FP32_To_BF16(t)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256bh
:Param Types:
    __m256bh src, 
    __mmask16 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256bh _mm256_mask_cvtne2ps_pbh(__m256bh src, __mmask16 k,
                                      __m256 a, __m256 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF j < 8
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-8]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256bh
:Param Types:
    __mmask16 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256bh _mm256_maskz_cvtne2ps_pbh(__mmask16 k, __m256 a,
                                       __m256 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		IF j < 8
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-8]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtneps_pbh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128bh
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128bh _mm256_cvtneps_pbh(__m256 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128bh
:Param Types:
    __m128bh src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128bh _mm256_mask_cvtneps_pbh(__m128bh src, __mmask8 k,
                                     __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128bh
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128bh _mm256_maskz_cvtneps_pbh(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi16_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m256h _mm256_cvtepi16_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m256h _mm256_mask_cvtepi16_ph(__m256h src, __mmask16 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m256h _mm256_maskz_cvtepi16_ph(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu16_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256h _mm256_cvtepu16_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256h _mm256_mask_cvtepu16_ph(__m256h src, __mmask16 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256h _mm256_maskz_cvtepu16_ph(__mmask16 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi32_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128h _mm256_cvtepi32_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128h _mm256_mask_cvtepi32_ph(__m128h src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128h _mm256_maskz_cvtepi32_ph(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepu32_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128h _mm256_cvtepu32_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128h _mm256_mask_cvtepu32_ph(__m128h src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128h _mm256_maskz_cvtepu32_ph(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtepi64_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128h _mm256_cvtepi64_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm256_mask_cvtepi64_ph(__m128h src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm256_maskz_cvtepi64_ph(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtepu64_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128h _mm256_cvtepu64_ph(__m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm256_mask_cvtepu64_ph(__m128h src, __mmask8 k,
                                    __m256i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm256_maskz_cvtepu64_ph(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtpd_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128h _mm256_cvtpd_ph(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_mask_cvtpd_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm256_mask_cvtpd_ph(__m128h src, __mmask8 k,
                                 __m256d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_maskz_cvtpd_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm256_maskz_cvtpd_ph(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm256_cvtxps_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128h _mm256_cvtxps_ph(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_mask_cvtxps_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128h _mm256_mask_cvtxps_ph(__m128h src, __mmask8 k,
                                  __m256 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_cvtxps_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128h _mm256_maskz_cvtxps_ph(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtph_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epi32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epi32(__m256i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epi32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epi32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epi32(__m256i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epi32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_epu32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epu32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epu32(__m256i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epu32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epu32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epu32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epu32(__m256i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epu32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epi64(__m256i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epi64(__m256i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epu64(__m256i src, __mmask8 k,
                                    __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epu64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epu64(__m256i src, __mmask8 k,
                                     __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epi16(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epi16(__m256i src, __mmask16 k,
                                    __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epi16(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epi16(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epi16(__m256i src, __mmask16 k,
                                     __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epi16(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvtph_epu16(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvtph_epu16(__m256i src, __mmask16 k,
                                    __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvtph_epu16(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttph_epu16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256i _mm256_cvttph_epu16(__m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_mask_cvttph_epu16(__m256i src, __mmask16 k,
                                     __m256h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256i _mm256_maskz_cvttph_epu16(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 15
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtph_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256d _mm256_cvtph_pd(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtph_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256d _mm256_mask_cvtph_pd(__m256d src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := src.fp64[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtph_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256d _mm256_maskz_cvtph_pd(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtxph_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256 _mm256_cvtxph_ps(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_cvtxph_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256 _mm256_mask_cvtxph_ps(__m256 src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := src.fp32[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_cvtxph_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256 _mm256_maskz_cvtxph_ps(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtsh_h
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-YMM
:Register: YMM 256 bit
:Return Type: _Float16
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm256_cvtsh_h(__m256h a);

.. admonition:: Intel Description

    Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a.fp16[0]
        	

XMM
~~~
_mm_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128i _mm_cvtsepi16_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Saturate8(a[i+15:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi16_epi8(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtsepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi16_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepi8_epi16(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi8_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := SignExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_cvtusepi16_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := SaturateU8(a[i+15:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi16_epi8(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtusepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi16_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtepi16_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_cvtepi16_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	dst[l+7:l] := Truncate8(a[i+15:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_cvtepi16_epi8(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepi16_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi16_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+15:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepu8_epi16(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu8_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*8
        	l := j*16
        	IF k[j]
        		dst[l+15:l] := ZeroExtend16(a[i+7:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtpd_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvtpd_epi64(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvtpd_epi64(__m128i src, __mmask8 k,
                                 __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtpd_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvtpd_epi64(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtpd_epu64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvtpd_epu64(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvtpd_epu64(__m128i src, __mmask8 k,
                                 __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtpd_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvtpd_epu64(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtps_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvtps_epi64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvtps_epi64(__m128i src, __mmask8 k,
                                 __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtps_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvtps_epi64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtps_epu64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvtps_epu64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvtps_epu64(__m128i src, __mmask8 k,
                                 __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtps_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvtps_epu64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi64_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128d _mm_cvtepi64_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128d _mm_mask_cvtepi64_pd(__m128d src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi64_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128d _mm_maskz_cvtepi64_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi64_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128 _mm_cvtepi64_ps(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128 _mm_mask_cvtepi64_ps(__m128 src, __mmask8 k,
                                __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepi64_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128 _mm_maskz_cvtepi64_ps(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvttpd_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvttpd_epi64(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvttpd_epi64(__m128i src, __mmask8 k,
                                  __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttpd_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvttpd_epi64(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttpd_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvttpd_epu64(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvttpd_epu64(__m128i src, __mmask8 k,
                                  __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttpd_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvttpd_epu64(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttps_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvttps_epi64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvttps_epi64(__m128i src, __mmask8 k,
                                  __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttps_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvttps_epi64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttps_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvttps_epu64(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvttps_epu64(__m128i src, __mmask8 k,
                                  __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttps_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvttps_epu64(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepu64_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128d _mm_cvtepu64_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128d _mm_mask_cvtepu64_pd(__m128d src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu64_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128d _mm_maskz_cvtepu64_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepu64_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128 _mm_cvtepu64_ps(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128 _mm_mask_cvtepu64_ps(__m128 src, __mmask8 k,
                                __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepu64_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128 _mm_maskz_cvtepu64_ps(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128d _mm_mask_cvtepi32_pd(__m128d src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := src[m+63:m]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi32_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128d _mm_maskz_cvtepi32_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF k[j]
        		dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        	ELSE
        		dst[m+63:m] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128 _mm_mask_cvtepi32_ps(__m128 src, __mmask8 k,
                                __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi32_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128 _mm_maskz_cvtepi32_ps(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvtpd_epi32(__m128i src, __mmask8 k,
                                 __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtpd_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvtpd_epi32(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtpd_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128 _mm_mask_cvtpd_ps(__m128 src, __mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtpd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128 _mm_maskz_cvtpd_ps(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtpd_epu32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvtpd_epu32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvtpd_epu32(__m128i src, __mmask8 k,
                                 __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	l := j*64
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtpd_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvtpd_epu32(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtph_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128 _mm_mask_cvtph_ps(__m128 src, __mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128 _mm_maskz_cvtph_ps(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvtps_epi32(__m128i src, __mmask8 k,
                                 __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtps_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvtps_epi32(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_cvt_roundps_ph(__m128i src, __mmask8 k,
                                    __m128 a, int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtps_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_cvtps_ph(__m128i src, __mmask8 k, __m128 a,
                              int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvt_roundps_ph
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_cvt_roundps_ph(__mmask8 k, __m128 a,
                                     int imm8)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtps_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_cvtps_ph(__mmask8 k, __m128 a, int imm8);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	l := 32*j
        	IF k[j]
        		dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtps_epu32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvtps_epu32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvtps_epu32(__m128i src, __mmask8 k,
                                 __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtps_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvtps_epu32(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvttpd_epi32(__m128i src, __mmask8 k,
                                  __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvttpd_epi32(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvttpd_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm_cvttpd_epu32(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_mask_cvttpd_epu32(__m128i src, __mmask8 k,
                                  __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvttpd_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128i _mm_maskz_cvttpd_epu32(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 32*j
        	l := 64*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvttps_epi32(__m128i src, __mmask8 k,
                                  __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttps_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvttps_epi32(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttps_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128i _mm_cvttps_epu32(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_mask_cvttps_epu32(__m128i src, __mmask8 k,
                                  __m128 a)

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttps_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128i _mm_maskz_cvttps_epu32(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	IF k[j]
        		dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepu32_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128d _mm_cvtepu32_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128d _mm_mask_cvtepu32_pd(__m128d src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu32_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128d _mm_maskz_cvtepu32_pd(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	l := j*32
        	IF k[j]
        		dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI	
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi32_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtepi32_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+31:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_cvtepi32_epi8(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi32_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtepi32_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_cvtepi32_epi16(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi32_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtepi64_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtepi64_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Truncate8(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtepi64_epi8(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi64_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Truncate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtepi64_epi32(__m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Truncate32(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtepi64_epi32(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi64_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Truncate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtepi64_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Truncate16(a[i+63:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtepi64_epi16(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi64_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Truncate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm_cvtsepi32_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+31:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi32_epi8(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtsepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi32_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128i _mm_cvtsepi32_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi32_epi16(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtsepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi32_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm_cvtsepi64_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := Saturate8(a[i+63:i])
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_mask_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi64_epi8(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_maskz_cvtsepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi64_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := Saturate8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm_cvtsepi64_epi32(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := Saturate32(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi64_epi32(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtsepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi64_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := Saturate32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128i _mm_cvtsepi64_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := Saturate16(a[i+63:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_mask_cvtsepi64_epi16(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtsepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtsepi64_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := Saturate16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepi8_epi32(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi8_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepi8_epi64(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_mask_cvtepi32_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi32_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI32 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_mask_cvtepi16_epi32(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	l := j*16
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi16_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := SignExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    SI64 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_mask_cvtepi16_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtepi16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := SignExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtusepi32_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+31:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi32_epi8(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtusepi32_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi32_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+31:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_cvtusepi32_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+31:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi32_epi16(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtusepi32_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi32_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+31:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtusepi64_epi8(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 8*j
        	dst[k+7:k] := SaturateU8(a[i+63:i])
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_mask_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi64_epi8(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := src[l+7:l]
        	FI
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_maskz_cvtusepi64_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi64_epi8(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[l+7:l] := SaturateU8(a[i+63:i])
        	ELSE
        		dst[l+7:l] := 0
        	FI
        ENDFOR
        dst[MAX:16] := 0
        	

_mm_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtusepi64_epi32(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 32*j
        	dst[k+31:k] := SaturateU32(a[i+63:i])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi64_epi32(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := src[l+31:l]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtusepi64_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi64_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[l+31:l] := SaturateU32(a[i+63:i])
        	ELSE
        		dst[l+31:l] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_cvtusepi64_epi16(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	k := 16*j
        	dst[k+15:k] := SaturateU16(a[i+63:i])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_cvtusepi64_epi16(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := src[l+15:l]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtusepi64_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_cvtusepi64_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[l+15:l] := SaturateU16(a[i+63:i])
        	ELSE
        		dst[l+15:l] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepu8_epi32(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu8_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in th elow 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 8*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+7:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_cvtepu8_epi64(__m128i src, __mmask8 k,
                                   __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu8_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 8*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+7:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_cvtepu32_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu32_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 32*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+31:l])
        	ELSE 
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_cvtepu16_epi32(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu16_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	l := 16*j
        	IF k[j]
        		dst[i+31:i] := ZeroExtend32(a[l+15:l])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_cvtepu16_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_cvtepu16_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := 64*j
        	l := 16*j
        	IF k[j]
        		dst[i+63:i] := ZeroExtend64(a[l+15:l])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvt_roundsd_i32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    int _mm_cvt_roundsd_i32(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32(a[63:0])
        	

_mm_cvt_roundsd_i64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __int64 _mm_cvt_roundsd_i64(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64(a[63:0])
        	

_mm_cvt_roundsd_si32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    int _mm_cvt_roundsd_si32(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32(a[63:0])
        	

_mm_cvt_roundsd_si64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __int64 _mm_cvt_roundsd_si64(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64(a[63:0])
        	

_mm_cvtsd_i32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm_cvtsd_i32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32(a[63:0])
        	

_mm_cvtsd_i64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvtsd_i64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64(a[63:0])
        	

_mm_cvt_roundsd_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundsd_ss(__m128 a, __m128d b,
                              int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundsd_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_cvt_roundsd_ss(__m128 src, __mmask8 k,
                                   __m128 a, __m128d b,
                                   int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cvtsd_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128d b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP64 b

.. code-block:: C

    __m128 _mm_mask_cvtsd_ss(__m128 src, __mmask8 k, __m128 a,
                             __m128d b)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundsd_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_cvt_roundsd_ss(__mmask8 k, __m128 a,
                                    __m128d b, int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtsd_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP64 b

.. code-block:: C

    __m128 _mm_maskz_cvtsd_ss(__mmask8 k, __m128 a, __m128d b);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[31:0] := Convert_FP64_To_FP32(b[63:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsd_u32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    unsigned int _mm_cvt_roundsd_u32(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
        	

_mm_cvt_roundsd_u64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    unsigned __int64 _mm_cvt_roundsd_u64(__m128d a, int rounding);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
        	

_mm_cvtsd_u32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    unsigned int _mm_cvtsd_u32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
        	

_mm_cvtsd_u64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    unsigned __int64 _mm_cvtsd_u64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
        	

_mm_cvt_roundi64_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __int64 b, 
    int rounding
:Param ETypes:
    FP64 a, 
    SI64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_cvt_roundi64_sd(__m128d a, __int64 b,
                                int rounding)

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsi64_sd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __int64 b, 
    int rounding
:Param ETypes:
    FP64 a, 
    SI64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_cvt_roundsi64_sd(__m128d a, __int64 b,
                                 int rounding)

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvti32_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int b
:Param ETypes:
    FP64 a, 
    SI32 b

.. code-block:: C

    __m128d _mm_cvti32_sd(__m128d a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int32_To_FP64(b[31:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvti64_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __int64 b
:Param ETypes:
    FP64 a, 
    SI64 b

.. code-block:: C

    __m128d _mm_cvti64_sd(__m128d a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvt_roundi32_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int b, 
    int rounding
:Param ETypes:
    FP32 a, 
    SI32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundi32_ss(__m128 a, int b, int rounding);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundi64_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __int64 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    SI64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundi64_ss(__m128 a, __int64 b,
                               int rounding)

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsi32_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int b, 
    int rounding
:Param ETypes:
    FP32 a, 
    SI32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundsi32_ss(__m128 a, int b, int rounding);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsi64_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __int64 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    SI64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundsi64_ss(__m128 a, __int64 b,
                                int rounding)

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvti32_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int b
:Param ETypes:
    FP32 a, 
    SI32 b

.. code-block:: C

    __m128 _mm_cvti32_ss(__m128 a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvti64_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __int64 b
:Param ETypes:
    FP32 a, 
    SI64 b

.. code-block:: C

    __m128 _mm_cvti64_ss(__m128 a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundss_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_cvt_roundss_sd(__m128d a, __m128 b, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". 
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundss_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_cvt_roundss_sd(__m128d src, __mmask8 k,
                                    __m128d a, __m128 b,
                                    int sae)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_cvtss_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128 b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP32 b

.. code-block:: C

    __m128d _mm_mask_cvtss_sd(__m128d src, __mmask8 k,
                              __m128d a, __m128 b)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundss_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_cvt_roundss_sd(__mmask8 k, __m128d a,
                                     __m128 b, int sae)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". 
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtss_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP32 b

.. code-block:: C

    __m128d _mm_maskz_cvtss_sd(__mmask8 k, __m128d a, __m128 b);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst[63:0] := Convert_FP32_To_FP64(b[31:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvt_roundss_i32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    int _mm_cvt_roundss_i32(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32(a[31:0])
        	

_mm_cvt_roundss_i64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __int64 _mm_cvt_roundss_i64(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64(a[31:0])
        	

_mm_cvt_roundss_si32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    int _mm_cvt_roundss_si32(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32(a[31:0])
        	

_mm_cvt_roundss_si64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __int64 _mm_cvt_roundss_si64(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64(a[31:0])
        	

_mm_cvtss_i32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvtss_i32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32(a[31:0])
        	

_mm_cvtss_i64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _mm_cvtss_i64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64(a[31:0])
        	

_mm_cvt_roundss_u32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    unsigned int _mm_cvt_roundss_u32(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
        	

_mm_cvt_roundss_u64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    unsigned __int64 _mm_cvt_roundss_u64(__m128 a, int rounding);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
        	

_mm_cvtss_u32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    unsigned int _mm_cvtss_u32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
        	

_mm_cvtss_u64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    unsigned __int64 _mm_cvtss_u64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
        	

_mm_cvtt_roundsd_i32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    int _mm_cvtt_roundsd_i32(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
        	

_mm_cvtt_roundsd_i64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __int64 _mm_cvtt_roundsd_i64(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
        	

_mm_cvtt_roundsd_si32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    int _mm_cvtt_roundsd_si32(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
        	

_mm_cvtt_roundsd_si64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __int64 _mm_cvtt_roundsd_si64(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
        	

_mm_cvttsd_i32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm_cvttsd_i32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
        	

_mm_cvttsd_i64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __int64 _mm_cvttsd_i64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
        	

_mm_cvtt_roundsd_u32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    unsigned int _mm_cvtt_roundsd_u32(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
        	

_mm_cvtt_roundsd_u64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    unsigned __int64 _mm_cvtt_roundsd_u64(__m128d a, int sae);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
        	

_mm_cvttsd_u32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    unsigned int _mm_cvttsd_u32(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
        	

_mm_cvttsd_u64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    unsigned __int64 _mm_cvttsd_u64(__m128d a);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
        	

_mm_cvtt_roundss_i32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    int _mm_cvtt_roundss_i32(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
        	

_mm_cvtt_roundss_i64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __int64 _mm_cvtt_roundss_i64(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
        	

_mm_cvtt_roundss_si32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    int _mm_cvtt_roundss_si32(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
        	

_mm_cvtt_roundss_si64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __int64 _mm_cvtt_roundss_si64(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
        	

_mm_cvttss_i32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm_cvttss_i32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
        	

_mm_cvttss_i64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __int64 _mm_cvttss_i64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
        	

_mm_cvtt_roundss_u32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    unsigned int _mm_cvtt_roundss_u32(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
        	

_mm_cvtt_roundss_u64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    unsigned __int64 _mm_cvtt_roundss_u64(__m128 a, int sae);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
        	

_mm_cvttss_u32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    unsigned int _mm_cvttss_u32(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
        	

_mm_cvttss_u64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    unsigned __int64 _mm_cvttss_u64(__m128 a);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
        	

_mm_cvt_roundu64_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    unsigned __int64 b, 
    int rounding
:Param ETypes:
    FP64 a, 
    UI64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_cvt_roundu64_sd(__m128d a, unsigned __int64 b,
                                int rounding)

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtu32_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    unsigned int b
:Param ETypes:
    FP64 a, 
    UI32 b

.. code-block:: C

    __m128d _mm_cvtu32_sd(__m128d a, unsigned int b);

.. admonition:: Intel Description

    Convert the unsigned 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int32_To_FP64(b[31:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtu64_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    unsigned __int64 b
:Param ETypes:
    FP64 a, 
    UI64 b

.. code-block:: C

    __m128d _mm_cvtu64_sd(__m128d a, unsigned __int64 b);

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := Convert_Int64_To_FP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvt_roundu32_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    unsigned int b, 
    int rounding
:Param ETypes:
    FP32 a, 
    UI32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundu32_ss(__m128 a, unsigned int b,
                               int rounding)

.. admonition:: Intel Description

    Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundu64_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    unsigned __int64 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    UI64 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_cvt_roundu64_ss(__m128 a, unsigned __int64 b,
                               int rounding)

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". 
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtu32_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    unsigned int b
:Param ETypes:
    FP32 a, 
    UI32 b

.. code-block:: C

    __m128 _mm_cvtu32_ss(__m128 a, unsigned int b);

.. admonition:: Intel Description

    Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int32_To_FP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtu64_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    unsigned __int64 b
:Param ETypes:
    FP32 a, 
    UI64 b

.. code-block:: C

    __m128 _mm_cvtu64_ss(__m128 a, unsigned __int64 b);

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_Int64_To_FP32(b[63:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtsbh_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: float
:Param Types:
    __bfloat16 a
:Param ETypes:
    BF16 a

.. code-block:: C

    float _mm_cvtsbh_ss(__bfloat16 a);

.. admonition:: Intel Description

    Convert the BF16 (16-bit) floating-point element in "a" to a floating-point element, and store the result in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := Convert_BF16_To_FP32(a[15:0])
        	

_mm_cvtpbh_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128bh a
:Param ETypes:
    BF16 a

.. code-block:: C

    __m128 _mm_cvtpbh_ps(__m128bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtpbh_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128bh a
:Param ETypes:
    MASK k, 
    BF16 a

.. code-block:: C

    __m128 _mm_maskz_cvtpbh_ps(__mmask8 k, __m128bh a);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtpbh_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128bh a
:Param ETypes:
    FP32 src, 
    MASK k, 
    BF16 a

.. code-block:: C

    __m128 _mm_mask_cvtpbh_ps(__m128 src, __mmask8 k,
                              __m128bh a)

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	IF k[j]
        		dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtness_sbh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __bfloat16
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __bfloat16 _mm_cvtness_sbh(float a);

.. admonition:: Intel Description

    Convert the single-precision (32-bit) floating-point element in "a" to a BF16 (16-bit) floating-point element, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Convert_FP32_To_BF16(a[31:0])
        	

_mm_cvtne2ps_pbh
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128bh _mm_cvtne2ps_pbh(__m128 a, __m128 b);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF j < 4
        		t := b.fp32[j]
        	ELSE
        		t := a.fp32[j-4]
        	FI
        	dst.word[j] := Convert_FP32_To_BF16(t)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128bh src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128bh _mm_mask_cvtne2ps_pbh(__m128bh src, __mmask8 k,
                                   __m128 a, __m128 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF j < 4
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-4]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtne2ps_pbh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128bh _mm_maskz_cvtne2ps_pbh(__mmask8 k, __m128 a,
                                    __m128 b)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		IF j < 4
        			t := b.fp32[j]
        		ELSE
        			t := a.fp32[j-4]
        		FI
        		dst.word[j] := Convert_FP32_To_BF16(t)
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtneps_pbh
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128bh _mm_cvtneps_pbh(__m128 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128bh src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    BF16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128bh _mm_mask_cvtneps_pbh(__m128bh src, __mmask8 k,
                                  __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtneps_pbh
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128bh _mm_maskz_cvtneps_pbh(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi16_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m128h _mm_cvtepi16_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI16 a

.. code-block:: C

    __m128h _mm_mask_cvtepi16_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepi16_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI16 a

.. code-block:: C

    __m128h _mm_maskz_cvtepi16_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepu16_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128h _mm_cvtepu16_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128h _mm_mask_cvtepu16_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtepu16_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128h _mm_maskz_cvtepu16_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtepi32_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m128h _mm_cvtepi32_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI32 a

.. code-block:: C

    __m128h _mm_mask_cvtepi32_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepi32_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI32 a

.. code-block:: C

    __m128h _mm_maskz_cvtepi32_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtepu32_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128h _mm_cvtepu32_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128h _mm_mask_cvtepu32_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtepu32_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128h _mm_maskz_cvtepu32_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtepi64_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    SI64 a

.. code-block:: C

    __m128h _mm_cvtepi64_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm_mask_cvtepi64_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtepi64_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    SI64 a

.. code-block:: C

    __m128h _mm_maskz_cvtepi64_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtepu64_ph
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128h _mm_cvtepu64_ph(__m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    FP16 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm_mask_cvtepu64_ph(__m128h src, __mmask8 k,
                                 __m128i a)

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtepu64_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128h _mm_maskz_cvtepu64_ph(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtpd_ph
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128h _mm_cvtpd_ph(__m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_mask_cvtpd_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm_mask_cvtpd_ph(__m128h src, __mmask8 k,
                              __m128d a)

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_maskz_cvtpd_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128h _mm_maskz_cvtpd_ph(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:32] := 0
        	

_mm_cvtxps_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128h _mm_cvtxps_ph(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".  The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_mask_cvtxps_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128h _mm_mask_cvtxps_ph(__m128h src, __mmask8 k,
                               __m128 a)

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).  The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := src.fp16[j]
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_maskz_cvtxps_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128h _mm_maskz_cvtxps_ph(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).  The upper 64 bits of "dst" are zeroed out.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
        	ELSE
        		dst.fp16[j] := 0
        	FI
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_cvtph_epi32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epi32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epi32(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epi32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epi32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epi32(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epi32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_epu32
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epu32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epu32(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epu32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epu32(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epu32(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := src.dword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epu32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epu32(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 3
        	IF k[j]
        		dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
        	ELSE
        		dst.dword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_epi64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epi64(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epi64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epi64(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epi64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_epu64
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epu64(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epu64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epu64(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epu64(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := src.qword[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epu64(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 1
        	IF k[j]
        		dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
        	ELSE
        		dst.qword[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_epi16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epi16(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epi16(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epi16(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epi16(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epi16(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epi16(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_epu16
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvtph_epu16(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvtph_epu16(__m128i src, __mmask8 k,
                                 __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvtph_epu16(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvttph_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128i _mm_cvttph_epu16(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    UI16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_mask_cvttph_epu16(__m128i src, __mmask8 k,
                                  __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := src.word[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvttph_epu16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128i _mm_maskz_cvttph_epu16(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 TO 7
        	IF k[j]
        		dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
        	ELSE
        		dst.word[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtph_pd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128d _mm_cvtph_pd(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtph_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128d _mm_mask_cvtph_pd(__m128d src, __mmask8 k,
                              __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := src.fp64[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtph_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128d _mm_maskz_cvtph_pd(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	IF k[j]
        		dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
        	ELSE
        		dst.fp64[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtxph_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128 _mm_cvtxph_ps(__m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_cvtxph_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128 _mm_mask_cvtxph_ps(__m128 src, __mmask8 k,
                              __m128h a)

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := src.fp32[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_cvtxph_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128 _mm_maskz_cvtxph_ps(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	IF k[j]
        		dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
        	ELSE
        		dst.fp32[j] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtsd_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128d b
:Param ETypes:
    FP16 a, 
    FP64 b

.. code-block:: C

    __m128h _mm_cvtsd_sh(__m128h a, __m128d b);

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsd_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128d b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundsd_sh(__m128h a, __m128d b,
                               const int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_cvtsd_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128d b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP64 b

.. code-block:: C

    __m128h _mm_mask_cvtsd_sh(__m128h src, __mmask8 k,
                              __m128h a, __m128d b)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundsd_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128d b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_cvt_roundsd_sh(__m128h src, __mmask8 k,
                                    __m128h a, __m128d b,
                                    const int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtsd_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP64 b

.. code-block:: C

    __m128h _mm_maskz_cvtsd_sh(__mmask8 k, __m128h a,
                               __m128d b)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundsd_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128d b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_cvt_roundsd_sh(__mmask8 k, __m128h a,
                                     __m128d b,
                                     const int rounding)

.. admonition:: Intel Description

    Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvtss_sh
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128 b
:Param ETypes:
    FP16 a, 
    FP32 b

.. code-block:: C

    __m128h _mm_cvtss_sh(__m128h a, __m128 b);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundss_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128 b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundss_sh(__m128h a, __m128 b,
                               const int rounding)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_cvtss_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128 b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP32 b

.. code-block:: C

    __m128h _mm_mask_cvtss_sh(__m128h src, __mmask8 k,
                              __m128h a, __m128 b)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundss_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128 b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_cvt_roundss_sh(__m128h src, __mmask8 k,
                                    __m128h a, __m128 b,
                                    const int rounding)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtss_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP32 b

.. code-block:: C

    __m128h _mm_maskz_cvtss_sh(__mmask8 k, __m128h a, __m128 b);

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundss_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128 b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_cvt_roundss_sh(__mmask8 k, __m128h a,
                                     __m128 b,
                                     const int rounding)

.. admonition:: Intel Description

    Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvtsh_sd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128h b
:Param ETypes:
    FP64 a, 
    FP16 b

.. code-block:: C

    __m128d _mm_cvtsh_sd(__m128d a, __m128h b);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsh_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP64 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_cvt_roundsh_sd(__m128d a, __m128h b,
                               const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_cvtsh_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128h b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP16 b

.. code-block:: C

    __m128d _mm_mask_cvtsh_sd(__m128d src, __mmask8 k,
                              __m128d a, __m128h b)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        ELSE
        	dst.fp64[0] := src.fp64[0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundsh_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_cvt_roundsh_sd(__m128d src, __mmask8 k,
                                    __m128d a, __m128h b,
                                    const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        ELSE
        	dst.fp64[0] := src.fp64[0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtsh_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP16 b

.. code-block:: C

    __m128d _mm_maskz_cvtsh_sd(__mmask8 k, __m128d a,
                               __m128h b)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        ELSE
        	dst.fp64[0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundsh_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128h b, 
    const int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_cvt_roundsh_sd(__mmask8 k, __m128d a,
                                     __m128h b, const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
        ELSE
        	dst.fp64[0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cvtsh_ss
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128h b
:Param ETypes:
    FP32 a, 
    FP16 b

.. code-block:: C

    __m128 _mm_cvtsh_ss(__m128 a, __m128h b);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvt_roundsh_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP32 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_cvt_roundsh_ss(__m128 a, __m128h b,
                              const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cvtsh_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128h b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP16 b

.. code-block:: C

    __m128 _mm_mask_cvtsh_ss(__m128 src, __mmask8 k, __m128 a,
                             __m128h b)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        ELSE
        	dst.fp32[0] := src.fp32[0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_cvt_roundsh_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_cvt_roundsh_ss(__m128 src, __mmask8 k,
                                   __m128 a, __m128h b,
                                   const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        ELSE
        	dst.fp32[0] := src.fp32[0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cvtsh_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP16 b

.. code-block:: C

    __m128 _mm_maskz_cvtsh_ss(__mmask8 k, __m128 a, __m128h b);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        ELSE
        	dst.fp32[0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_cvt_roundsh_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128h b, 
    const int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_cvt_roundsh_ss(__mmask8 k, __m128 a,
                                    __m128h b, const int sae)

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF k[0]
        	dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
        ELSE
        	dst.fp32[0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_cvtsh_i32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    int _mm_cvtsh_i32(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_Int32(a.fp16[0])
        	

_mm_cvt_roundsh_i32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    int _mm_cvt_roundsh_i32(__m128h a, int rounding);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_Int32(a.fp16[0])
        	

_mm_cvtsh_i64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __int64 _mm_cvtsh_i64(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_Int64(a.fp16[0])
        	

_mm_cvt_roundsh_i64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    __int64 _mm_cvt_roundsh_i64(__m128h a, int rounding);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_Int64(a.fp16[0])
        	

_mm_cvttsh_i32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    int _mm_cvttsh_i32(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_Int32_Truncate(a.fp16[0])
        	

_mm_cvtt_roundsh_i32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    int _mm_cvtt_roundsh_i32(__m128h a, int sae);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_Int32_Truncate(a.fp16[0])
        	

_mm_cvttsh_i64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __int64 _mm_cvttsh_i64(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_Int64_Truncate(a.fp16[0])
        	

_mm_cvtt_roundsh_i64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __int64
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __int64 _mm_cvtt_roundsh_i64(__m128h a, int sae);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_Int64_Truncate(a.fp16[0])
        	

_mm_cvtsh_u32
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    unsigned int _mm_cvtsh_u32(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_UInt32(a.fp16[0])
        	

_mm_cvt_roundsh_u32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    unsigned int _mm_cvt_roundsh_u32(__m128h a, int sae);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_UInt32(a.fp16[0])
        	

_mm_cvtsh_u64
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    unsigned __int64 _mm_cvtsh_u64(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_UInt64(a.fp16[0])
        	

_mm_cvt_roundsh_u64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128h a, 
    int rounding
:Param ETypes:
    FP16 a, 
    IMM rounding

.. code-block:: C

    unsigned __int64 _mm_cvt_roundsh_u64(__m128h a, int rounding);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". [round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_UInt64(a.fp16[0])
        	

_mm_cvttsh_u32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    unsigned int _mm_cvttsh_u32(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_UInt32_Truncate(a.fp16[0])
        	

_mm_cvtt_roundsh_u32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned int
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    unsigned int _mm_cvtt_roundsh_u32(__m128h a, int sae);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.dword := Convert_FP16_To_UInt32_Truncate(a.fp16[0])
        	

_mm_cvttsh_u64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    unsigned __int64 _mm_cvttsh_u64(__m128h a);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_UInt64_Truncate(a.fp16[0])
        	

_mm_cvtt_roundsh_u64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: unsigned __int64
:Param Types:
    __m128h a, 
    int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    unsigned __int64 _mm_cvtt_roundsh_u64(__m128h a, int sae);

.. admonition:: Intel Description

    Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.qword := Convert_FP16_To_UInt64_Truncate(a.fp16[0])
        	

_mm_cvti32_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    int b
:Param ETypes:
    FP16 a, 
    SI32 b

.. code-block:: C

    __m128h _mm_cvti32_sh(__m128h a, int b);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundi32_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    int b, 
    int rounding
:Param ETypes:
    FP16 a, 
    SI32 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundi32_sh(__m128h a, int b, int rounding);

.. admonition:: Intel Description

    Convert the signed 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvtu32_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    unsigned int b
:Param ETypes:
    FP16 a, 
    UI32 b

.. code-block:: C

    __m128h _mm_cvtu32_sh(__m128h a, unsigned int b);

.. admonition:: Intel Description

    Convert the unsigned 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundu32_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    unsigned int b, 
    int rounding
:Param ETypes:
    FP16 a, 
    UI32 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundu32_sh(__m128h a, unsigned int b,
                                int rounding)

.. admonition:: Intel Description

    Convert the unsigned 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvti64_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __int64 b
:Param ETypes:
    FP16 a, 
    SI64 b

.. code-block:: C

    __m128h _mm_cvti64_sh(__m128h a, __int64 b);

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundi64_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __int64 b, 
    int rounding
:Param ETypes:
    FP16 a, 
    SI64 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundi64_sh(__m128h a, __int64 b,
                                int rounding)

.. admonition:: Intel Description

    Convert the signed 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvtu64_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    unsigned __int64 b
:Param ETypes:
    FP16 a, 
    UI64 b

.. code-block:: C

    __m128h _mm_cvtu64_sh(__m128h a, unsigned __int64 b);

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvt_roundu64_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    unsigned __int64 b, 
    int rounding
:Param ETypes:
    FP16 a, 
    UI64 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_cvt_roundu64_sh(__m128h a, unsigned __int64 b,
                                int rounding)

.. admonition:: Intel Description

    Convert the unsigned 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_cvtsi16_si128
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    short a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_cvtsi16_si128(short a);

.. admonition:: Intel Description

    Copy 16-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0]
        dst[MAX:16] := 0
        	

_mm_cvtsi128_si16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: short
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    short _mm_cvtsi128_si16(__m128i a);

.. admonition:: Intel Description

    Copy the lower 16-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.fp16[0] := a.fp16[0]
        dst[MAX:16] := 0
        	

_mm_cvtsh_h
^^^^^^^^^^^
:Tech: AVX-512
:Category: Convert
:Header: immintrin.h
:Searchable: AVX-512-Convert-XMM
:Register: XMM 128 bit
:Return Type: _Float16
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    _Float16 _mm_cvtsh_h(__m128h a);

.. admonition:: Intel Description

    Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a.fp16[0]
        	

Miscellaneous
-------------
ZMM
~~~
_mm512_kunpackd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __mmask64 a, 
    __mmask64 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask64 _mm512_kunpackd(__mmask64 a, __mmask64 b);

.. admonition:: Intel Description

    Unpack and interleave 32 bits from masks "a" and "b", and store the 64-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := b[31:0]
        dst[63:32] := a[31:0]
        dst[MAX:64] := 0
        	

_mm512_kunpackw
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 a, 
    __mmask32 b
:Param ETypes:
    MASK a, 
    MASK b

.. code-block:: C

    __mmask32 _mm512_kunpackw(__mmask32 a, __mmask32 b);

.. admonition:: Intel Description

    Unpack and interleave 16 bits from masks "a" and "b", and store the 32-bit result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := b[15:0]
        dst[31:16] := a[15:0]
        dst[MAX:32] := 0
        	

_mm512_dbsad_epu8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_dbsad_epu8(__m512i a, __m512i b, int imm8);

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 7
        	i := j*64
        	dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	               ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                  ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_dbsad_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_dbsad_epu8(__m512i src, __mmask32 k,
                                   __m512i a, __m512i b,
                                   int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 7
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_dbsad_epu8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_dbsad_epu8(__mmask32 k, __m512i a,
                                    __m512i b, int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 3
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 7
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_alignr_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_alignr_epi8(__m512i a, __m512i b,
                               const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	dst[i+127:i] := tmp[127:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_alignr_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_alignr_epi8(__m512i src, __mmask64 k,
                                    __m512i a, __m512i b,
                                    const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	tmp_dst[i+127:i] := tmp[127:0]
        ENDFOR
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_alignr_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_alignr_epi8(__mmask64 k, __m512i a,
                                     __m512i b, const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	tmp_dst[i+127:i] := tmp[127:0]
        ENDFOR
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_blend_epi8(__mmask64 k, __m512i a,
                                   __m512i b)

.. admonition:: Intel Description

    Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := b[i+7:i]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_blend_epi16(__mmask32 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := b[i+15:i]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m512i _mm512_broadcastb_epi8(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_mask_broadcastb_epi8(__m512i src,
                                        __mmask64 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m512i _mm512_maskz_broadcastb_epi8(__mmask64 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m512i _mm512_broadcastw_epi16(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_broadcastw_epi16(__m512i src,
                                         __mmask32 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_broadcastw_epi16(__mmask32 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask2_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __mmask32 k, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    MASK k, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask2_permutex2var_epi16(__m512i a,
                                            __m512i idx,
                                            __mmask32 k,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+4:i]
        		dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := idx[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __mmask32 k, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_permutex2var_epi16(__m512i a,
                                           __mmask32 k,
                                           __m512i idx,
                                           __m512i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+4:i]
        		dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_permutex2var_epi16(__mmask32 k,
                                            __m512i a,
                                            __m512i idx,
                                            __m512i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+4:i]
        		dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i idx, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m512i _mm512_permutex2var_epi16(__m512i a, __m512i idx,
                                      __m512i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	off := 16*idx[i+4:i]
        	dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m512i _mm512_mask_permutexvar_epi16(__m512i src,
                                          __mmask32 k,
                                          __m512i idx,
                                          __m512i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	id := idx[i+4:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i idx, 
    __m512i a
:Param ETypes:
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m512i _mm512_maskz_permutexvar_epi16(__mmask32 k,
                                           __m512i idx,
                                           __m512i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	id := idx[i+4:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i idx, 
    __m512i a
:Param ETypes:
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m512i _mm512_permutexvar_epi16(__m512i idx, __m512i a);

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	id := idx[i+4:i]*16
        	dst[i+15:i] := a[id+15:id]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movepi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask64
:Param Types:
    __m512i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __mmask64 _mm512_movepi8_mask(__m512i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF a[i+7]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:64] := 0
        	

_mm512_movm_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_movm_epi8(__mmask64 k);

.. admonition:: Intel Description

    Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := 0xFF
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movm_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_movm_epi16(__mmask32 k);

.. admonition:: Intel Description

    Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 0xFFFF
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movepi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __mmask32 _mm512_movepi16_mask(__m512i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	IF a[i+15]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_sad_epu8
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_sad_epu8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce eight unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 63
        	i := j*8
        	tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
        ENDFOR
        FOR j := 0 to 7
        	i := j*64
        	dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
        	               tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
        	dst[i+63:i+16] := 0
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shufflehi_epi16(__m512i src,
                                        __mmask32 k, __m512i a,
                                        int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        tmp_dst[191:128] := a[191:128]
        tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        tmp_dst[319:256] := a[319:256]
        tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
        tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
        tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
        tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
        tmp_dst[447:384] := a[447:384]
        tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
        tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
        tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
        tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shufflehi_epi16(__mmask32 k, __m512i a,
                                         int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        tmp_dst[191:128] := a[191:128]
        tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        tmp_dst[319:256] := a[319:256]
        tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
        tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
        tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
        tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
        tmp_dst[447:384] := a[447:384]
        tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
        tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
        tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
        tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shufflehi_epi16(__m512i a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        dst[191:128] := a[191:128]
        dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        dst[319:256] := a[319:256]
        dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
        dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
        dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
        dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
        dst[447:384] := a[447:384]
        dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
        dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
        dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
        dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
        dst[MAX:512] := 0
        	

_mm512_mask_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_shufflelo_epi16(__m512i src,
                                        __mmask32 k, __m512i a,
                                        int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        tmp_dst[255:192] := a[255:192]
        tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
        tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
        tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
        tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
        tmp_dst[383:320] := a[383:320]
        tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
        tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
        tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
        tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
        tmp_dst[511:448] := a[511:448]
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_shufflelo_epi16(__mmask32 k, __m512i a,
                                         int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        tmp_dst[255:192] := a[255:192]
        tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
        tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
        tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
        tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
        tmp_dst[383:320] := a[383:320]
        tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
        tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
        tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
        tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
        tmp_dst[511:448] := a[511:448]
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_shufflelo_epi16(__m512i a, int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        dst[127:64] := a[127:64]
        dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        dst[255:192] := a[255:192]
        dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
        dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
        dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
        dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
        dst[383:320] := a[383:320]
        dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
        dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
        dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
        dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
        dst[511:448] := a[511:448]
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_unpackhi_epi8(__m512i src, __mmask64 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_unpackhi_epi8(__mmask64 k, __m512i a,
                                       __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_unpackhi_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_unpackhi_epi16(__m512i src, __mmask32 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_unpackhi_epi16(__mmask32 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_unpackhi_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_mask_unpacklo_epi8(__m512i src, __mmask64 k,
                                      __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_maskz_unpacklo_epi8(__mmask64 k, __m512i a,
                                       __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m512i _mm512_unpacklo_epi8(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_mask_unpacklo_epi16(__m512i src, __mmask32 k,
                                       __m512i a, __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_maskz_unpacklo_epi16(__mmask32 k, __m512i a,
                                        __m512i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
        tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m512i _mm512_unpacklo_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
        dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
        dst[MAX:512] := 0
        	

_mm512_mask_packs_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_packs_epi32(__m512i src, __mmask32 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        tmp_dst[143:128] := Saturate16(a[159:128])
        tmp_dst[159:144] := Saturate16(a[191:160])
        tmp_dst[175:160] := Saturate16(a[223:192])
        tmp_dst[191:176] := Saturate16(a[255:224])
        tmp_dst[207:192] := Saturate16(b[159:128])
        tmp_dst[223:208] := Saturate16(b[191:160])
        tmp_dst[239:224] := Saturate16(b[223:192])
        tmp_dst[255:240] := Saturate16(b[255:224])
        tmp_dst[271:256] := Saturate16(a[287:256])
        tmp_dst[287:272] := Saturate16(a[319:288])
        tmp_dst[303:288] := Saturate16(a[351:320])
        tmp_dst[319:304] := Saturate16(a[383:352])
        tmp_dst[335:320] := Saturate16(b[287:256])
        tmp_dst[351:336] := Saturate16(b[319:288])
        tmp_dst[367:352] := Saturate16(b[351:320])
        tmp_dst[383:368] := Saturate16(b[383:352])
        tmp_dst[399:384] := Saturate16(a[415:384])
        tmp_dst[415:400] := Saturate16(a[447:416])
        tmp_dst[431:416] := Saturate16(a[479:448])
        tmp_dst[447:432] := Saturate16(a[511:480])
        tmp_dst[463:448] := Saturate16(b[415:384])
        tmp_dst[479:464] := Saturate16(b[447:416])
        tmp_dst[495:480] := Saturate16(b[479:448])
        tmp_dst[511:496] := Saturate16(b[511:480])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_packs_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_maskz_packs_epi32(__mmask32 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        tmp_dst[143:128] := Saturate16(a[159:128])
        tmp_dst[159:144] := Saturate16(a[191:160])
        tmp_dst[175:160] := Saturate16(a[223:192])
        tmp_dst[191:176] := Saturate16(a[255:224])
        tmp_dst[207:192] := Saturate16(b[159:128])
        tmp_dst[223:208] := Saturate16(b[191:160])
        tmp_dst[239:224] := Saturate16(b[223:192])
        tmp_dst[255:240] := Saturate16(b[255:224])
        tmp_dst[271:256] := Saturate16(a[287:256])
        tmp_dst[287:272] := Saturate16(a[319:288])
        tmp_dst[303:288] := Saturate16(a[351:320])
        tmp_dst[319:304] := Saturate16(a[383:352])
        tmp_dst[335:320] := Saturate16(b[287:256])
        tmp_dst[351:336] := Saturate16(b[319:288])
        tmp_dst[367:352] := Saturate16(b[351:320])
        tmp_dst[383:368] := Saturate16(b[383:352])
        tmp_dst[399:384] := Saturate16(a[415:384])
        tmp_dst[415:400] := Saturate16(a[447:416])
        tmp_dst[431:416] := Saturate16(a[479:448])
        tmp_dst[447:432] := Saturate16(a[511:480])
        tmp_dst[463:448] := Saturate16(b[415:384])
        tmp_dst[479:464] := Saturate16(b[447:416])
        tmp_dst[495:480] := Saturate16(b[479:448])
        tmp_dst[511:496] := Saturate16(b[511:480])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_packs_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_packs_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:0])
        dst[31:16] := Saturate16(a[63:32])
        dst[47:32] := Saturate16(a[95:64])
        dst[63:48] := Saturate16(a[127:96])
        dst[79:64] := Saturate16(b[31:0])
        dst[95:80] := Saturate16(b[63:32])
        dst[111:96] := Saturate16(b[95:64])
        dst[127:112] := Saturate16(b[127:96])
        dst[143:128] := Saturate16(a[159:128])
        dst[159:144] := Saturate16(a[191:160])
        dst[175:160] := Saturate16(a[223:192])
        dst[191:176] := Saturate16(a[255:224])
        dst[207:192] := Saturate16(b[159:128])
        dst[223:208] := Saturate16(b[191:160])
        dst[239:224] := Saturate16(b[223:192])
        dst[255:240] := Saturate16(b[255:224])
        dst[271:256] := Saturate16(a[287:256])
        dst[287:272] := Saturate16(a[319:288])
        dst[303:288] := Saturate16(a[351:320])
        dst[319:304] := Saturate16(a[383:352])
        dst[335:320] := Saturate16(b[287:256])
        dst[351:336] := Saturate16(b[319:288])
        dst[367:352] := Saturate16(b[351:320])
        dst[383:368] := Saturate16(b[383:352])
        dst[399:384] := Saturate16(a[415:384])
        dst[415:400] := Saturate16(a[447:416])
        dst[431:416] := Saturate16(a[479:448])
        dst[447:432] := Saturate16(a[511:480])
        dst[463:448] := Saturate16(b[415:384])
        dst[479:464] := Saturate16(b[447:416])
        dst[495:480] := Saturate16(b[479:448])
        dst[511:496] := Saturate16(b[511:480])
        dst[MAX:512] := 0
        	

_mm512_mask_packs_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_packs_epi16(__m512i src, __mmask64 k,
                                    __m512i a, __m512i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        tmp_dst[135:128] := Saturate8(a[143:128])
        tmp_dst[143:136] := Saturate8(a[159:144])
        tmp_dst[151:144] := Saturate8(a[175:160])
        tmp_dst[159:152] := Saturate8(a[191:176])
        tmp_dst[167:160] := Saturate8(a[207:192])
        tmp_dst[175:168] := Saturate8(a[223:208])
        tmp_dst[183:176] := Saturate8(a[239:224])
        tmp_dst[191:184] := Saturate8(a[255:240])
        tmp_dst[199:192] := Saturate8(b[143:128])
        tmp_dst[207:200] := Saturate8(b[159:144])
        tmp_dst[215:208] := Saturate8(b[175:160])
        tmp_dst[223:216] := Saturate8(b[191:176])
        tmp_dst[231:224] := Saturate8(b[207:192])
        tmp_dst[239:232] := Saturate8(b[223:208])
        tmp_dst[247:240] := Saturate8(b[239:224])
        tmp_dst[255:248] := Saturate8(b[255:240])
        tmp_dst[263:256] := Saturate8(a[271:256])
        tmp_dst[271:264] := Saturate8(a[287:272])
        tmp_dst[279:272] := Saturate8(a[303:288])
        tmp_dst[287:280] := Saturate8(a[319:304])
        tmp_dst[295:288] := Saturate8(a[335:320])
        tmp_dst[303:296] := Saturate8(a[351:336])
        tmp_dst[311:304] := Saturate8(a[367:352])
        tmp_dst[319:312] := Saturate8(a[383:368])
        tmp_dst[327:320] := Saturate8(b[271:256])
        tmp_dst[335:328] := Saturate8(b[287:272])
        tmp_dst[343:336] := Saturate8(b[303:288])
        tmp_dst[351:344] := Saturate8(b[319:304])
        tmp_dst[359:352] := Saturate8(b[335:320])
        tmp_dst[367:360] := Saturate8(b[351:336])
        tmp_dst[375:368] := Saturate8(b[367:352])
        tmp_dst[383:376] := Saturate8(b[383:368])
        tmp_dst[391:384] := Saturate8(a[399:384])
        tmp_dst[399:392] := Saturate8(a[415:400])
        tmp_dst[407:400] := Saturate8(a[431:416])
        tmp_dst[415:408] := Saturate8(a[447:432])
        tmp_dst[423:416] := Saturate8(a[463:448])
        tmp_dst[431:424] := Saturate8(a[479:464])
        tmp_dst[439:432] := Saturate8(a[495:480])
        tmp_dst[447:440] := Saturate8(a[511:496])
        tmp_dst[455:448] := Saturate8(b[399:384])
        tmp_dst[463:456] := Saturate8(b[415:400])
        tmp_dst[471:464] := Saturate8(b[431:416])
        tmp_dst[479:472] := Saturate8(b[447:432])
        tmp_dst[487:480] := Saturate8(b[463:448])
        tmp_dst[495:488] := Saturate8(b[479:464])
        tmp_dst[503:496] := Saturate8(b[495:480])
        tmp_dst[511:504] := Saturate8(b[511:496])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_packs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_packs_epi16(__mmask64 k, __m512i a,
                                     __m512i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        tmp_dst[135:128] := Saturate8(a[143:128])
        tmp_dst[143:136] := Saturate8(a[159:144])
        tmp_dst[151:144] := Saturate8(a[175:160])
        tmp_dst[159:152] := Saturate8(a[191:176])
        tmp_dst[167:160] := Saturate8(a[207:192])
        tmp_dst[175:168] := Saturate8(a[223:208])
        tmp_dst[183:176] := Saturate8(a[239:224])
        tmp_dst[191:184] := Saturate8(a[255:240])
        tmp_dst[199:192] := Saturate8(b[143:128])
        tmp_dst[207:200] := Saturate8(b[159:144])
        tmp_dst[215:208] := Saturate8(b[175:160])
        tmp_dst[223:216] := Saturate8(b[191:176])
        tmp_dst[231:224] := Saturate8(b[207:192])
        tmp_dst[239:232] := Saturate8(b[223:208])
        tmp_dst[247:240] := Saturate8(b[239:224])
        tmp_dst[255:248] := Saturate8(b[255:240])
        tmp_dst[263:256] := Saturate8(a[271:256])
        tmp_dst[271:264] := Saturate8(a[287:272])
        tmp_dst[279:272] := Saturate8(a[303:288])
        tmp_dst[287:280] := Saturate8(a[319:304])
        tmp_dst[295:288] := Saturate8(a[335:320])
        tmp_dst[303:296] := Saturate8(a[351:336])
        tmp_dst[311:304] := Saturate8(a[367:352])
        tmp_dst[319:312] := Saturate8(a[383:368])
        tmp_dst[327:320] := Saturate8(b[271:256])
        tmp_dst[335:328] := Saturate8(b[287:272])
        tmp_dst[343:336] := Saturate8(b[303:288])
        tmp_dst[351:344] := Saturate8(b[319:304])
        tmp_dst[359:352] := Saturate8(b[335:320])
        tmp_dst[367:360] := Saturate8(b[351:336])
        tmp_dst[375:368] := Saturate8(b[367:352])
        tmp_dst[383:376] := Saturate8(b[383:368])
        tmp_dst[391:384] := Saturate8(a[399:384])
        tmp_dst[399:392] := Saturate8(a[415:400])
        tmp_dst[407:400] := Saturate8(a[431:416])
        tmp_dst[415:408] := Saturate8(a[447:432])
        tmp_dst[423:416] := Saturate8(a[463:448])
        tmp_dst[431:424] := Saturate8(a[479:464])
        tmp_dst[439:432] := Saturate8(a[495:480])
        tmp_dst[447:440] := Saturate8(a[511:496])
        tmp_dst[455:448] := Saturate8(b[399:384])
        tmp_dst[463:456] := Saturate8(b[415:400])
        tmp_dst[471:464] := Saturate8(b[431:416])
        tmp_dst[479:472] := Saturate8(b[447:432])
        tmp_dst[487:480] := Saturate8(b[463:448])
        tmp_dst[495:488] := Saturate8(b[479:464])
        tmp_dst[503:496] := Saturate8(b[495:480])
        tmp_dst[511:504] := Saturate8(b[511:496])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_packs_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_packs_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := Saturate8(a[15:0])
        dst[15:8] := Saturate8(a[31:16])
        dst[23:16] := Saturate8(a[47:32])
        dst[31:24] := Saturate8(a[63:48])
        dst[39:32] := Saturate8(a[79:64])
        dst[47:40] := Saturate8(a[95:80])
        dst[55:48] := Saturate8(a[111:96])
        dst[63:56] := Saturate8(a[127:112])
        dst[71:64] := Saturate8(b[15:0])
        dst[79:72] := Saturate8(b[31:16])
        dst[87:80] := Saturate8(b[47:32])
        dst[95:88] := Saturate8(b[63:48])
        dst[103:96] := Saturate8(b[79:64])
        dst[111:104] := Saturate8(b[95:80])
        dst[119:112] := Saturate8(b[111:96])
        dst[127:120] := Saturate8(b[127:112])
        dst[135:128] := Saturate8(a[143:128])
        dst[143:136] := Saturate8(a[159:144])
        dst[151:144] := Saturate8(a[175:160])
        dst[159:152] := Saturate8(a[191:176])
        dst[167:160] := Saturate8(a[207:192])
        dst[175:168] := Saturate8(a[223:208])
        dst[183:176] := Saturate8(a[239:224])
        dst[191:184] := Saturate8(a[255:240])
        dst[199:192] := Saturate8(b[143:128])
        dst[207:200] := Saturate8(b[159:144])
        dst[215:208] := Saturate8(b[175:160])
        dst[223:216] := Saturate8(b[191:176])
        dst[231:224] := Saturate8(b[207:192])
        dst[239:232] := Saturate8(b[223:208])
        dst[247:240] := Saturate8(b[239:224])
        dst[255:248] := Saturate8(b[255:240])
        dst[263:256] := Saturate8(a[271:256])
        dst[271:264] := Saturate8(a[287:272])
        dst[279:272] := Saturate8(a[303:288])
        dst[287:280] := Saturate8(a[319:304])
        dst[295:288] := Saturate8(a[335:320])
        dst[303:296] := Saturate8(a[351:336])
        dst[311:304] := Saturate8(a[367:352])
        dst[319:312] := Saturate8(a[383:368])
        dst[327:320] := Saturate8(b[271:256])
        dst[335:328] := Saturate8(b[287:272])
        dst[343:336] := Saturate8(b[303:288])
        dst[351:344] := Saturate8(b[319:304])
        dst[359:352] := Saturate8(b[335:320])
        dst[367:360] := Saturate8(b[351:336])
        dst[375:368] := Saturate8(b[367:352])
        dst[383:376] := Saturate8(b[383:368])
        dst[391:384] := Saturate8(a[399:384])
        dst[399:392] := Saturate8(a[415:400])
        dst[407:400] := Saturate8(a[431:416])
        dst[415:408] := Saturate8(a[447:432])
        dst[423:416] := Saturate8(a[463:448])
        dst[431:424] := Saturate8(a[479:464])
        dst[439:432] := Saturate8(a[495:480])
        dst[447:440] := Saturate8(a[511:496])
        dst[455:448] := Saturate8(b[399:384])
        dst[463:456] := Saturate8(b[415:400])
        dst[471:464] := Saturate8(b[431:416])
        dst[479:472] := Saturate8(b[447:432])
        dst[487:480] := Saturate8(b[463:448])
        dst[495:488] := Saturate8(b[479:464])
        dst[503:496] := Saturate8(b[495:480])
        dst[511:504] := Saturate8(b[511:496])
        dst[MAX:512] := 0
        	

_mm512_mask_packus_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_mask_packus_epi32(__m512i src, __mmask32 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        tmp_dst[143:128] := SaturateU16(a[159:128])
        tmp_dst[159:144] := SaturateU16(a[191:160])
        tmp_dst[175:160] := SaturateU16(a[223:192])
        tmp_dst[191:176] := SaturateU16(a[255:224])
        tmp_dst[207:192] := SaturateU16(b[159:128])
        tmp_dst[223:208] := SaturateU16(b[191:160])
        tmp_dst[239:224] := SaturateU16(b[223:192])
        tmp_dst[255:240] := SaturateU16(b[255:224])
        tmp_dst[271:256] := SaturateU16(a[287:256])
        tmp_dst[287:272] := SaturateU16(a[319:288])
        tmp_dst[303:288] := SaturateU16(a[351:320])
        tmp_dst[319:304] := SaturateU16(a[383:352])
        tmp_dst[335:320] := SaturateU16(b[287:256])
        tmp_dst[351:336] := SaturateU16(b[319:288])
        tmp_dst[367:352] := SaturateU16(b[351:320])
        tmp_dst[383:368] := SaturateU16(b[383:352])
        tmp_dst[399:384] := SaturateU16(a[415:384])
        tmp_dst[415:400] := SaturateU16(a[447:416])
        tmp_dst[431:416] := SaturateU16(a[479:448])
        tmp_dst[447:432] := SaturateU16(a[511:480])
        tmp_dst[463:448] := SaturateU16(b[415:384])
        tmp_dst[479:464] := SaturateU16(b[447:416])
        tmp_dst[495:480] := SaturateU16(b[479:448])
        tmp_dst[511:496] := SaturateU16(b[511:480])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_packus_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask32 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_maskz_packus_epi32(__mmask32 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        tmp_dst[143:128] := SaturateU16(a[159:128])
        tmp_dst[159:144] := SaturateU16(a[191:160])
        tmp_dst[175:160] := SaturateU16(a[223:192])
        tmp_dst[191:176] := SaturateU16(a[255:224])
        tmp_dst[207:192] := SaturateU16(b[159:128])
        tmp_dst[223:208] := SaturateU16(b[191:160])
        tmp_dst[239:224] := SaturateU16(b[223:192])
        tmp_dst[255:240] := SaturateU16(b[255:224])
        tmp_dst[271:256] := SaturateU16(a[287:256])
        tmp_dst[287:272] := SaturateU16(a[319:288])
        tmp_dst[303:288] := SaturateU16(a[351:320])
        tmp_dst[319:304] := SaturateU16(a[383:352])
        tmp_dst[335:320] := SaturateU16(b[287:256])
        tmp_dst[351:336] := SaturateU16(b[319:288])
        tmp_dst[367:352] := SaturateU16(b[351:320])
        tmp_dst[383:368] := SaturateU16(b[383:352])
        tmp_dst[399:384] := SaturateU16(a[415:384])
        tmp_dst[415:400] := SaturateU16(a[447:416])
        tmp_dst[431:416] := SaturateU16(a[479:448])
        tmp_dst[447:432] := SaturateU16(a[511:480])
        tmp_dst[463:448] := SaturateU16(b[415:384])
        tmp_dst[479:464] := SaturateU16(b[447:416])
        tmp_dst[495:480] := SaturateU16(b[479:448])
        tmp_dst[511:496] := SaturateU16(b[511:480])
        FOR j := 0 to 31
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_packus_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m512i _mm512_packus_epi32(__m512i a, __m512i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := SaturateU16(a[31:0])
        dst[31:16] := SaturateU16(a[63:32])
        dst[47:32] := SaturateU16(a[95:64])
        dst[63:48] := SaturateU16(a[127:96])
        dst[79:64] := SaturateU16(b[31:0])
        dst[95:80] := SaturateU16(b[63:32])
        dst[111:96] := SaturateU16(b[95:64])
        dst[127:112] := SaturateU16(b[127:96])
        dst[143:128] := SaturateU16(a[159:128])
        dst[159:144] := SaturateU16(a[191:160])
        dst[175:160] := SaturateU16(a[223:192])
        dst[191:176] := SaturateU16(a[255:224])
        dst[207:192] := SaturateU16(b[159:128])
        dst[223:208] := SaturateU16(b[191:160])
        dst[239:224] := SaturateU16(b[223:192])
        dst[255:240] := SaturateU16(b[255:224])
        dst[271:256] := SaturateU16(a[287:256])
        dst[287:272] := SaturateU16(a[319:288])
        dst[303:288] := SaturateU16(a[351:320])
        dst[319:304] := SaturateU16(a[383:352])
        dst[335:320] := SaturateU16(b[287:256])
        dst[351:336] := SaturateU16(b[319:288])
        dst[367:352] := SaturateU16(b[351:320])
        dst[383:368] := SaturateU16(b[383:352])
        dst[399:384] := SaturateU16(a[415:384])
        dst[415:400] := SaturateU16(a[447:416])
        dst[431:416] := SaturateU16(a[479:448])
        dst[447:432] := SaturateU16(a[511:480])
        dst[463:448] := SaturateU16(b[415:384])
        dst[479:464] := SaturateU16(b[447:416])
        dst[495:480] := SaturateU16(b[479:448])
        dst[511:496] := SaturateU16(b[511:480])
        dst[MAX:512] := 0
        	

_mm512_mask_packus_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_mask_packus_epi16(__m512i src, __mmask64 k,
                                     __m512i a, __m512i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        tmp_dst[135:128] := SaturateU8(a[143:128])
        tmp_dst[143:136] := SaturateU8(a[159:144])
        tmp_dst[151:144] := SaturateU8(a[175:160])
        tmp_dst[159:152] := SaturateU8(a[191:176])
        tmp_dst[167:160] := SaturateU8(a[207:192])
        tmp_dst[175:168] := SaturateU8(a[223:208])
        tmp_dst[183:176] := SaturateU8(a[239:224])
        tmp_dst[191:184] := SaturateU8(a[255:240])
        tmp_dst[199:192] := SaturateU8(b[143:128])
        tmp_dst[207:200] := SaturateU8(b[159:144])
        tmp_dst[215:208] := SaturateU8(b[175:160])
        tmp_dst[223:216] := SaturateU8(b[191:176])
        tmp_dst[231:224] := SaturateU8(b[207:192])
        tmp_dst[239:232] := SaturateU8(b[223:208])
        tmp_dst[247:240] := SaturateU8(b[239:224])
        tmp_dst[255:248] := SaturateU8(b[255:240])
        tmp_dst[263:256] := SaturateU8(a[271:256])
        tmp_dst[271:264] := SaturateU8(a[287:272])
        tmp_dst[279:272] := SaturateU8(a[303:288])
        tmp_dst[287:280] := SaturateU8(a[319:304])
        tmp_dst[295:288] := SaturateU8(a[335:320])
        tmp_dst[303:296] := SaturateU8(a[351:336])
        tmp_dst[311:304] := SaturateU8(a[367:352])
        tmp_dst[319:312] := SaturateU8(a[383:368])
        tmp_dst[327:320] := SaturateU8(b[271:256])
        tmp_dst[335:328] := SaturateU8(b[287:272])
        tmp_dst[343:336] := SaturateU8(b[303:288])
        tmp_dst[351:344] := SaturateU8(b[319:304])
        tmp_dst[359:352] := SaturateU8(b[335:320])
        tmp_dst[367:360] := SaturateU8(b[351:336])
        tmp_dst[375:368] := SaturateU8(b[367:352])
        tmp_dst[383:376] := SaturateU8(b[383:368])
        tmp_dst[391:384] := SaturateU8(a[399:384])
        tmp_dst[399:392] := SaturateU8(a[415:400])
        tmp_dst[407:400] := SaturateU8(a[431:416])
        tmp_dst[415:408] := SaturateU8(a[447:432])
        tmp_dst[423:416] := SaturateU8(a[463:448])
        tmp_dst[431:424] := SaturateU8(a[479:464])
        tmp_dst[439:432] := SaturateU8(a[495:480])
        tmp_dst[447:440] := SaturateU8(a[511:496])
        tmp_dst[455:448] := SaturateU8(b[399:384])
        tmp_dst[463:456] := SaturateU8(b[415:400])
        tmp_dst[471:464] := SaturateU8(b[431:416])
        tmp_dst[479:472] := SaturateU8(b[447:432])
        tmp_dst[487:480] := SaturateU8(b[463:448])
        tmp_dst[495:488] := SaturateU8(b[479:464])
        tmp_dst[503:496] := SaturateU8(b[495:480])
        tmp_dst[511:504] := SaturateU8(b[511:496])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_packus_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask64 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_maskz_packus_epi16(__mmask64 k, __m512i a,
                                      __m512i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        tmp_dst[135:128] := SaturateU8(a[143:128])
        tmp_dst[143:136] := SaturateU8(a[159:144])
        tmp_dst[151:144] := SaturateU8(a[175:160])
        tmp_dst[159:152] := SaturateU8(a[191:176])
        tmp_dst[167:160] := SaturateU8(a[207:192])
        tmp_dst[175:168] := SaturateU8(a[223:208])
        tmp_dst[183:176] := SaturateU8(a[239:224])
        tmp_dst[191:184] := SaturateU8(a[255:240])
        tmp_dst[199:192] := SaturateU8(b[143:128])
        tmp_dst[207:200] := SaturateU8(b[159:144])
        tmp_dst[215:208] := SaturateU8(b[175:160])
        tmp_dst[223:216] := SaturateU8(b[191:176])
        tmp_dst[231:224] := SaturateU8(b[207:192])
        tmp_dst[239:232] := SaturateU8(b[223:208])
        tmp_dst[247:240] := SaturateU8(b[239:224])
        tmp_dst[255:248] := SaturateU8(b[255:240])
        tmp_dst[263:256] := SaturateU8(a[271:256])
        tmp_dst[271:264] := SaturateU8(a[287:272])
        tmp_dst[279:272] := SaturateU8(a[303:288])
        tmp_dst[287:280] := SaturateU8(a[319:304])
        tmp_dst[295:288] := SaturateU8(a[335:320])
        tmp_dst[303:296] := SaturateU8(a[351:336])
        tmp_dst[311:304] := SaturateU8(a[367:352])
        tmp_dst[319:312] := SaturateU8(a[383:368])
        tmp_dst[327:320] := SaturateU8(b[271:256])
        tmp_dst[335:328] := SaturateU8(b[287:272])
        tmp_dst[343:336] := SaturateU8(b[303:288])
        tmp_dst[351:344] := SaturateU8(b[319:304])
        tmp_dst[359:352] := SaturateU8(b[335:320])
        tmp_dst[367:360] := SaturateU8(b[351:336])
        tmp_dst[375:368] := SaturateU8(b[367:352])
        tmp_dst[383:376] := SaturateU8(b[383:368])
        tmp_dst[391:384] := SaturateU8(a[399:384])
        tmp_dst[399:392] := SaturateU8(a[415:400])
        tmp_dst[407:400] := SaturateU8(a[431:416])
        tmp_dst[415:408] := SaturateU8(a[447:432])
        tmp_dst[423:416] := SaturateU8(a[463:448])
        tmp_dst[431:424] := SaturateU8(a[479:464])
        tmp_dst[439:432] := SaturateU8(a[495:480])
        tmp_dst[447:440] := SaturateU8(a[511:496])
        tmp_dst[455:448] := SaturateU8(b[399:384])
        tmp_dst[463:456] := SaturateU8(b[415:400])
        tmp_dst[471:464] := SaturateU8(b[431:416])
        tmp_dst[479:472] := SaturateU8(b[447:432])
        tmp_dst[487:480] := SaturateU8(b[463:448])
        tmp_dst[495:488] := SaturateU8(b[479:464])
        tmp_dst[503:496] := SaturateU8(b[495:480])
        tmp_dst[511:504] := SaturateU8(b[511:496])
        FOR j := 0 to 63
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_packus_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m512i _mm512_packus_epi16(__m512i a, __m512i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := SaturateU8(a[15:0])
        dst[15:8] := SaturateU8(a[31:16])
        dst[23:16] := SaturateU8(a[47:32])
        dst[31:24] := SaturateU8(a[63:48])
        dst[39:32] := SaturateU8(a[79:64])
        dst[47:40] := SaturateU8(a[95:80])
        dst[55:48] := SaturateU8(a[111:96])
        dst[63:56] := SaturateU8(a[127:112])
        dst[71:64] := SaturateU8(b[15:0])
        dst[79:72] := SaturateU8(b[31:16])
        dst[87:80] := SaturateU8(b[47:32])
        dst[95:88] := SaturateU8(b[63:48])
        dst[103:96] := SaturateU8(b[79:64])
        dst[111:104] := SaturateU8(b[95:80])
        dst[119:112] := SaturateU8(b[111:96])
        dst[127:120] := SaturateU8(b[127:112])
        dst[135:128] := SaturateU8(a[143:128])
        dst[143:136] := SaturateU8(a[159:144])
        dst[151:144] := SaturateU8(a[175:160])
        dst[159:152] := SaturateU8(a[191:176])
        dst[167:160] := SaturateU8(a[207:192])
        dst[175:168] := SaturateU8(a[223:208])
        dst[183:176] := SaturateU8(a[239:224])
        dst[191:184] := SaturateU8(a[255:240])
        dst[199:192] := SaturateU8(b[143:128])
        dst[207:200] := SaturateU8(b[159:144])
        dst[215:208] := SaturateU8(b[175:160])
        dst[223:216] := SaturateU8(b[191:176])
        dst[231:224] := SaturateU8(b[207:192])
        dst[239:232] := SaturateU8(b[223:208])
        dst[247:240] := SaturateU8(b[239:224])
        dst[255:248] := SaturateU8(b[255:240])
        dst[263:256] := SaturateU8(a[271:256])
        dst[271:264] := SaturateU8(a[287:272])
        dst[279:272] := SaturateU8(a[303:288])
        dst[287:280] := SaturateU8(a[319:304])
        dst[295:288] := SaturateU8(a[335:320])
        dst[303:296] := SaturateU8(a[351:336])
        dst[311:304] := SaturateU8(a[367:352])
        dst[319:312] := SaturateU8(a[383:368])
        dst[327:320] := SaturateU8(b[271:256])
        dst[335:328] := SaturateU8(b[287:272])
        dst[343:336] := SaturateU8(b[303:288])
        dst[351:344] := SaturateU8(b[319:304])
        dst[359:352] := SaturateU8(b[335:320])
        dst[367:360] := SaturateU8(b[351:336])
        dst[375:368] := SaturateU8(b[367:352])
        dst[383:376] := SaturateU8(b[383:368])
        dst[391:384] := SaturateU8(a[399:384])
        dst[399:392] := SaturateU8(a[415:400])
        dst[407:400] := SaturateU8(a[431:416])
        dst[415:408] := SaturateU8(a[447:432])
        dst[423:416] := SaturateU8(a[463:448])
        dst[431:424] := SaturateU8(a[479:464])
        dst[439:432] := SaturateU8(a[495:480])
        dst[447:440] := SaturateU8(a[511:496])
        dst[455:448] := SaturateU8(b[399:384])
        dst[463:456] := SaturateU8(b[415:400])
        dst[471:464] := SaturateU8(b[431:416])
        dst[479:472] := SaturateU8(b[447:432])
        dst[487:480] := SaturateU8(b[463:448])
        dst[495:488] := SaturateU8(b[479:464])
        dst[503:496] := SaturateU8(b[495:480])
        dst[511:504] := SaturateU8(b[511:496])
        dst[MAX:512] := 0
        	

_mm512_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_broadcast_f32x2(__m128 a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_broadcast_f32x2(__m512 src, __mmask16 k,
                                       __m128 a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_broadcast_f32x2(__mmask16 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_f32x8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_broadcast_f32x8(__m256 a);

.. admonition:: Intel Description

    Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_f32x8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_broadcast_f32x8(__m512 src, __mmask16 k,
                                       __m256 a)

.. admonition:: Intel Description

    Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_f32x8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_broadcast_f32x8(__mmask16 k, __m256 a);

.. admonition:: Intel Description

    Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_broadcast_f64x2(__m128d a);

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_broadcast_f64x2(__m512d src, __mmask8 k,
                                        __m128d a)

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_broadcast_f64x2(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_broadcast_i32x2(__m128i a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_broadcast_i32x2(__m512i src,
                                        __mmask16 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_broadcast_i32x2(__mmask16 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_i32x8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m512i _mm512_broadcast_i32x8(__m256i a);

.. admonition:: Intel Description

    Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_i32x8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_mask_broadcast_i32x8(__m512i src,
                                        __mmask16 k, __m256i a)

.. admonition:: Intel Description

    Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_i32x8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m512i _mm512_maskz_broadcast_i32x8(__mmask16 k,
                                         __m256i a)

.. admonition:: Intel Description

    Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	n := (j % 8)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m512i _mm512_broadcast_i64x2(__m128i a);

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_mask_broadcast_i64x2(__m512i src, __mmask8 k,
                                        __m128i a)

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m512i _mm512_maskz_broadcast_i64x2(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_extractf32x8_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm512_extractf32x8_ps(__m512 a, int imm8);

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[255:0] := a[255:0]
        1: dst[255:0] := a[511:256]
        ESAC
        dst[MAX:256] := 0
        	

_mm512_mask_extractf32x8_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm512_mask_extractf32x8_ps(__m256 src, __mmask8 k,
                                       __m512 a, int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_extractf32x8_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm512_maskz_extractf32x8_ps(__mmask8 k, __m512 a,
                                        int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128d
:Param Types:
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm512_extractf64x2_pd(__m512d a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        2: dst[127:0] := a[383:256]
        3: dst[127:0] := a[511:384]
        ESAC
        dst[MAX:128] := 0
        	

_mm512_mask_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm512_mask_extractf64x2_pd(__m128d src, __mmask8 k,
                                        __m512d a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm512_maskz_extractf64x2_pd(__mmask8 k, __m512d a,
                                         int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_extracti32x8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_extracti32x8_epi32(__m512i a, int imm8);

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[255:0] := a[255:0]
        1: dst[255:0] := a[511:256]
        ESAC
        dst[MAX:256] := 0
        	

_mm512_mask_extracti32x8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_mask_extracti32x8_epi32(__m256i src,
                                           __mmask8 k,
                                           __m512i a, int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_extracti32x8_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm512_maskz_extracti32x8_epi32(__mmask8 k,
                                            __m512i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[255:0] := a[255:0]
        1: tmp[255:0] := a[511:256]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_extracti64x2_epi64(__m512i a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        2: dst[127:0] := a[383:256]
        3: dst[127:0] := a[511:384]
        ESAC
        dst[MAX:128] := 0
        	

_mm512_mask_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_mask_extracti64x2_epi64(__m128i src,
                                           __mmask8 k,
                                           __m512i a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_maskz_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm512_maskz_extracti64x2_epi64(__mmask8 k,
                                            __m512i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[1:0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        2: tmp[127:0] := a[383:256]
        3: tmp[127:0] := a[511:384]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm512_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_fpclass_pd_mask(__m512d a, int imm8);

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m512d a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm512_mask_fpclass_pd_mask(__mmask8 k1, __m512d a,
                                         int imm8)

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k1[j]
        		k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_fpclass_ps_mask(__m512 a, int imm8);

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_mask_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m512 a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm512_mask_fpclass_ps_mask(__mmask16 k1,
                                          __m512 a, int imm8)

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k1[j]
        		k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_insertf32x8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_insertf32x8(__m512 a, __m256 b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: dst[255:0] := b[255:0]
        1: dst[511:256] := b[255:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_insertf32x8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_insertf32x8(__m512 src, __mmask16 k,
                                   __m512 a, __m256 b,
                                   int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_insertf32x8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_insertf32x8(__mmask16 k, __m512 a,
                                    __m256 b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_insertf64x2
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_insertf64x2(__m512d a, __m128d b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE imm8[1:0] OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        2: dst[383:256] := b[127:0]
        3: dst[511:384] := b[127:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_insertf64x2
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_insertf64x2(__m512d src, __mmask8 k,
                                    __m512d a, __m128d b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_insertf64x2
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_insertf64x2(__mmask8 k, __m512d a,
                                     __m128d b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_inserti32x8
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_inserti32x8(__m512i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE imm8[0] OF
        0: dst[255:0] := b[255:0]
        1: dst[511:256] := b[255:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_inserti32x8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_inserti32x8(__m512i src, __mmask16 k,
                                    __m512i a, __m256i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_inserti32x8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_inserti32x8(__mmask16 k, __m512i a,
                                     __m256i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[0]) OF
        0: tmp[255:0] := b[255:0]
        1: tmp[511:256] := b[255:0]
        ESAC
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_inserti64x2
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_inserti64x2(__m512i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[511:0] := a[511:0]
        CASE imm8[1:0] OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        2: dst[383:256] := b[127:0]
        3: dst[511:384] := b[127:0]
        ESAC
        dst[MAX:512] := 0
        	

_mm512_mask_inserti64x2
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_inserti64x2(__m512i src, __mmask8 k,
                                    __m512i a, __m128i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_inserti64x2
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_inserti64x2(__mmask8 k, __m512i a,
                                     __m128i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[511:0] := a[511:0]
        CASE (imm8[1:0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        2: tmp[383:256] := b[127:0]
        3: tmp[511:384] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movepi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask16
:Param Types:
    __m512i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __mmask16 _mm512_movepi32_mask(__m512i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF a[i+31]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm512_movm_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_movm_epi32(__mmask16 k);

.. admonition:: Intel Description

    Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := 0xFFFFFFFF
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movm_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m512i _mm512_movm_epi64(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_movepi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask8
:Param Types:
    __m512i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __mmask8 _mm512_movepi64_mask(__m512i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*64
        	IF a[i+63]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm512_mask_range_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_range_pd(__m512d src, __mmask8 k,
                                 __m512d a, __m512d b,
                                 int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_range_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_range_round_pd(__m512d src, __mmask8 k,
                                       __m512d a, __m512d b,
                                       int imm8, int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_range_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_range_pd(__mmask8 k, __m512d a,
                                  __m512d b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_range_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_range_round_pd(__mmask8 k, __m512d a,
                                        __m512d b, int imm8,
                                        int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_range_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_range_pd(__m512d a, __m512d b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_range_round_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_range_round_pd(__m512d a, __m512d b,
                                  int imm8, int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_range_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_range_ps(__m512 src, __mmask16 k,
                                __m512 a, __m512 b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_range_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_range_round_ps(__m512 src, __mmask16 k,
                                      __m512 a, __m512 b,
                                      int imm8, int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_range_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_range_ps(__mmask16 k, __m512 a,
                                 __m512 b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_range_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_range_round_ps(__mmask16 k, __m512 a,
                                       __m512 b, int imm8,
                                       int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_range_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_range_ps(__m512 a, __m512 b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_range_round_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_range_round_ps(__m512 a, __m512 b, int imm8,
                                 int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_reduce_pd(__m512d src, __mmask8 k,
                                  __m512d a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_reduce_round_pd(__m512d src, __mmask8 k,
                                        __m512d a, int imm8,
                                        int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_reduce_pd(__mmask8 k, __m512d a,
                                   int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_reduce_round_pd(__mmask8 k, __m512d a,
                                         int imm8, int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_reduce_pd(__m512d a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_round_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_reduce_round_pd(__m512d a, int imm8,
                                   int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_reduce_ps(__m512 src, __mmask16 k,
                                 __m512 a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_reduce_round_ps(__m512 src, __mmask16 k,
                                       __m512 a, int imm8,
                                       int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_reduce_ps(__mmask16 k, __m512 a,
                                  int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_reduce_round_ps(__mmask16 k, __m512 a,
                                        int imm8, int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_reduce_ps(__m512 a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_round_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_reduce_round_ps(__m512 a, int imm8, int sae);

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_alignr_epi32(__mmask16 k, __m512i a,
                                      __m512i b,
                                      const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and stores the low 64 bytes (16 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_alignr_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_alignr_epi64(__m512i a, __m512i b,
                                const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
        dst[511:0] := temp[511:0]
        dst[MAX:512] := 0
        	

_mm512_mask_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_alignr_epi64(__m512i src, __mmask8 k,
                                     __m512i a, __m512i b,
                                     const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask8 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_maskz_alignr_epi64(__mmask8 k, __m512i a,
                                      __m512i b,
                                      const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and stores the low 64 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fixupimm_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_fixupimm_pd(__m512d a, __m512d b, __m512i c,
                               int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fixupimm_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_fixupimm_round_pd(__m512d a, __m512d b,
                                     __m512i c, int imm8,
                                     int sae)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_fixupimm_pd(__m512d a, __mmask8 k,
                                    __m512d b, __m512i c,
                                    int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fixupimm_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __mmask8 k, 
    __m512d b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_fixupimm_round_pd(__m512d a, __mmask8 k,
                                          __m512d b, __m512i c,
                                          int imm8, int sae)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_fixupimm_pd(__mmask8 k, __m512d a,
                                     __m512d b, __m512i c,
                                     int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fixupimm_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_fixupimm_round_pd(__mmask8 k,
                                           __m512d a, __m512d b,
                                           __m512i c, int imm8,
                                           int sae)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fixupimm_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_fixupimm_ps(__m512 a, __m512 b, __m512i c,
                              int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_fixupimm_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_fixupimm_round_ps(__m512 a, __m512 b,
                                    __m512i c, int imm8,
                                    int sae)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_fixupimm_ps(__m512 a, __mmask16 k,
                                   __m512 b, __m512i c,
                                   int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_fixupimm_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __mmask16 k, 
    __m512 b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_fixupimm_round_ps(__m512 a, __mmask16 k,
                                         __m512 b, __m512i c,
                                         int imm8, int sae)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_fixupimm_ps(__mmask16 k, __m512 a,
                                    __m512 b, __m512i c,
                                    int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_fixupimm_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    __m512i c, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_fixupimm_round_ps(__mmask16 k, __m512 a,
                                          __m512 b, __m512i c,
                                          int imm8, int sae)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_maskz_getexp_pd(__mmask8 k, __m512d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_getexp_round_pd(__mmask8 k, __m512d a,
                                         int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_maskz_getexp_ps(__mmask16 k, __m512 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_getexp_round_ps(__mmask16 k, __m512 a,
                                        int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512d _mm512_maskz_getmant_pd(
        __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_getmant_round_pd(
        __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512 _mm512_maskz_getmant_ps(
        __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_getmant_round_ps(
        __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_rorv_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __mmask16 k, 
    __m512i a, 
    __m512i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m512i _mm512_maskz_rorv_epi32(__mmask16 k, __m512i a,
                                    __m512i b)

.. admonition:: Intel Description

    Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
        	count := count_src % 32
        	RETURN (src >>count) OR (src << (32 - count))
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_mask_roundscale_pd(__m512d src, __mmask8 k,
                                      __m512d a, int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_roundscale_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_roundscale_round_pd(__m512d src,
                                            __mmask8 k,
                                            __m512d a, int imm8,
                                            int sae)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_maskz_roundscale_pd(__mmask8 k, __m512d a,
                                       int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_roundscale_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_maskz_roundscale_round_pd(__mmask8 k,
                                             __m512d a,
                                             int imm8, int sae)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_roundscale_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m512d _mm512_roundscale_pd(__m512d a, int imm8);

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_roundscale_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512d _mm512_roundscale_round_pd(__m512d a, int imm8,
                                       int sae)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_mask_roundscale_ps(__m512 src, __mmask16 k,
                                     __m512 a, int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_roundscale_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_roundscale_round_ps(__m512 src,
                                           __mmask16 k,
                                           __m512 a, int imm8,
                                           int sae)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_maskz_roundscale_ps(__mmask16 k, __m512 a,
                                      int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_roundscale_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_maskz_roundscale_round_ps(__mmask16 k,
                                            __m512 a, int imm8,
                                            int sae)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_roundscale_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m512 _mm512_roundscale_ps(__m512 a, int imm8);

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_roundscale_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512 _mm512_roundscale_round_ps(__m512 a, int imm8,
                                      int sae)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_scalef_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_mask_scalef_pd(__m512d src, __mmask8 k,
                                  __m512d a, __m512d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_scalef_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_mask_scalef_round_pd(__m512d src, __mmask8 k,
                                        __m512d a, __m512d b,
                                        int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_scalef_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_maskz_scalef_pd(__mmask8 k, __m512d a,
                                   __m512d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_scalef_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __mmask8 k, 
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_maskz_scalef_round_pd(__mmask8 k, __m512d a,
                                         __m512d b,
                                         int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_scalef_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m512d _mm512_scalef_pd(__m512d a, __m512d b);

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_scalef_round_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    __m512d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m512d _mm512_scalef_round_pd(__m512d a, __m512d b,
                                   int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_scalef_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_mask_scalef_ps(__m512 src, __mmask16 k,
                                 __m512 a, __m512 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_scalef_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_mask_scalef_round_ps(__m512 src, __mmask16 k,                                   __m512 a, __m512 b,
                                       int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_scalef_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_maskz_scalef_ps(__mmask16 k, __m512 a,
                                  __m512 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_scalef_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __mmask16 k, 
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_maskz_scalef_round_ps(__mmask16 k, __m512 a,
                                        __m512 b, int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_scalef_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m512 _mm512_scalef_ps(__m512 a, __m512 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_scalef_round_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    __m512 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m512 _mm512_scalef_round_ps(__m512 a, __m512 b,
                                  int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_alignr_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_alignr_epi32(__m512i a, __m512i b,
                                const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
        dst[511:0] := temp[511:0]
        dst[MAX:512] := 0
        	

_mm512_mask_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512i
:Param Types:
    __m512i src, 
    __mmask16 k, 
    __m512i a, 
    __m512i b, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m512i _mm512_mask_alignr_epi32(__m512i src, __mmask16 k,
                                     __m512i a, __m512i b,
                                     const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[1023:512] := a[511:0]
        temp[511:0] := b[511:0]
        temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getexp_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m512d _mm512_getexp_pd(__m512d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getexp_round_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_getexp_round_pd(__m512d a, int sae);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m512d _mm512_mask_getexp_pd(__m512d src, __mmask8 k,
                                  __m512d a)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_getexp_round_pd(__m512d src, __mmask8 k,
                                        __m512d a, int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getexp_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m512 _mm512_getexp_ps(__m512 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getexp_round_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_getexp_round_ps(__m512 a, int sae);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m512 _mm512_mask_getexp_ps(__m512 src, __mmask16 k,
                                 __m512 a)

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_getexp_round_ps(__m512 src, __mmask16 k,
                                       __m512 a, int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512d _mm512_getmant_pd(__m512d a,
                              _MM_MANTISSA_NORM_ENUM interv,
                              _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_round_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP64 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512d _mm512_getmant_round_pd(
        __m512d a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512d _mm512_mask_getmant_pd(
        __m512d src, __mmask8 k, __m512d a,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_round_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512d
:Param Types:
    __m512d src, 
    __mmask8 k, 
    __m512d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512d _mm512_mask_getmant_round_pd(
        __m512d src, __mmask8 k, __m512d a,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512 _mm512_getmant_ps(__m512 a,
                             _MM_MANTISSA_NORM_ENUM interv,
                             _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_round_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP32 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512 _mm512_getmant_round_ps(
        __m512 a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m512 _mm512_mask_getmant_ps(__m512 src, __mmask16 k,
                                  __m512 a,
                                  _MM_MANTISSA_NORM_ENUM interv,
                                  _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_round_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512
:Param Types:
    __m512 src, 
    __mmask16 k, 
    __m512 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m512 _mm512_mask_getmant_round_ps(
        __m512 src, __mmask16 k, __m512 a,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 15
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_roundscale_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_roundscale_ph(__m512h a, int imm8);

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_roundscale_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_roundscale_round_ph(__m512h a, int imm8,
                                       const int sae)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_mask_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_mask_roundscale_ph(__m512h src, __mmask32 k,
                                      __m512h a, int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_mask_roundscale_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_roundscale_round_ph(__m512h src,
                                            __mmask32 k,
                                            __m512h a, int imm8,
                                            const int sae)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_maskz_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_maskz_roundscale_ph(__mmask32 k, __m512h a,
                                       int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_maskz_roundscale_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_roundscale_round_ph(__mmask32 k,
                                             __m512h a,
                                             int imm8,
                                             const int sae)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dest[MAX:512] := 0
        	

_mm512_getexp_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m512h _mm512_getexp_ph(__m512h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getexp_round_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    const int sae
:Param ETypes:
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512h _mm512_getexp_round_ph(__m512h a, const int sae);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_mask_getexp_ph(__m512h src, __mmask32 k,
                                  __m512h a)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getexp_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_getexp_round_ph(__m512h src,
                                        __mmask32 k, __m512h a,
                                        const int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m512h _mm512_maskz_getexp_ph(__mmask32 k, __m512h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getexp_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_getexp_round_ph(__mmask32 k, __m512h a,
                                         const int sae)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m512h _mm512_getmant_ph(__m512h a,
                              _MM_MANTISSA_NORM_ENUM norm,
                              _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_getmant_round_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    FP16 a, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m512h _mm512_getmant_round_ph(__m512h a,
                                    _MM_MANTISSA_NORM_ENUM norm,
                                    _MM_MANTISSA_SIGN_ENUM sign,
                                    const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m512h _mm512_mask_getmant_ph(__m512h src, __mmask32 k,
                                   __m512h a,
                                   _MM_MANTISSA_NORM_ENUM norm,
                                   _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_getmant_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_getmant_round_ph(
        __m512h src, __mmask32 k, __m512h a,
        _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign, const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m512h _mm512_maskz_getmant_ph(
        __mmask32 k, __m512h a, _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_getmant_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_getmant_round_ph(
        __mmask32 k, __m512h a, _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign, const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    			[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 31
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_reduce_ph(__m512h a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_reduce_round_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_reduce_round_ph(__m512h a, int imm8,
                                   const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_mask_reduce_ph(__m512h src, __mmask32 k,
                                  __m512h a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_reduce_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_mask_reduce_round_ph(__m512h src,
                                        __mmask32 k, __m512h a,
                                        int imm8,
                                        const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m512h _mm512_maskz_reduce_ph(__mmask32 k, __m512h a,
                                   int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_maskz_reduce_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m512h _mm512_maskz_reduce_round_ph(__mmask32 k, __m512h a,
                                         int imm8,
                                         const int sae)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 31
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_scalef_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_scalef_ph(__m512h a, __m512h b);

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_scalef_round_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_scalef_round_ph(__m512h a, __m512h b,
                                   const int rounding)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_scalef_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_scalef_ph(__m512h src, __mmask32 k,
                                  __m512h a, __m512h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_mask_scalef_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h src, 
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_mask_scalef_round_ph(__m512h src,
                                        __mmask32 k, __m512h a,
                                        __m512h b,
                                        const int rounding)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_scalef_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_maskz_scalef_ph(__mmask32 k, __m512h a,
                                   __m512h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_maskz_scalef_round_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m512h _mm512_maskz_scalef_round_ph(__mmask32 k, __m512h a,
                                         __m512h b,
                                         const int rounding)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm512_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __m512h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_fpclass_ph_mask(__m512h a, int imm8);

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    				[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_mask_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __mmask32
:Param Types:
    __mmask32 k1, 
    __m512h a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask32 _mm512_mask_fpclass_ph_mask(__mmask32 k1,
                                          __m512h a, int imm8)

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    			[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 31
        	IF k1[i]
        		k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        	ELSE
        		k[i] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm512_permutex2var_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512h a, 
    __m512i idx, 
    __m512h b
:Param ETypes:
    FP16 a, 
    UI16 idx, 
    FP16 b

.. code-block:: C

    __m512h _mm512_permutex2var_ph(__m512h a, __m512i idx,
                                   __m512h b)

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	off := idx[i+4:i]
        	dst.fp16[j] := idx[i+5] ? b.fp16[off] : a.fp16[off]
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_mask_blend_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __mmask32 k, 
    __m512h a, 
    __m512h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m512h _mm512_mask_blend_ph(__mmask32 k, __m512h a,
                                 __m512h b)

.. admonition:: Intel Description

    Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	IF k[j]
        		dst.fp16[j] := b.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:512] := 0
        	

_mm512_permutexvar_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-ZMM
:Register: ZMM 512 bit
:Return Type: __m512h
:Param Types:
    __m512i idx, 
    __m512h a
:Param ETypes:
    UI16 idx, 
    FP16 a

.. code-block:: C

    __m512h _mm512_permutexvar_ph(__m512i idx, __m512h a);

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*16
        	id := idx[i+4:i]
        	dst.fp16[j] := a.fp16[id]
        ENDFOR
        dst[MAX:512] := 0
        	

YMM
~~~
_mm256_dbsad_epu8
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_dbsad_epu8(__m256i a, __m256i b, int imm8);

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 3
        	i := j*64
        	dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	               ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                  ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_dbsad_epu8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_dbsad_epu8(__m256i src, __mmask16 k,
                                   __m256i a, __m256i b,
                                   int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 3
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_dbsad_epu8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_dbsad_epu8(__mmask16 k, __m256i a,
                                    __m256i b, int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR i := 0 to 1
        	tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
        	tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
        	tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
        	tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
        ENDFOR
        FOR j := 0 to 3
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_alignr_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_alignr_epi8(__m256i src, __mmask32 k,
                                    __m256i a, __m256i b,
                                    const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	tmp_dst[i+127:i] := tmp[127:0]
        ENDFOR
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_alignr_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_alignr_epi8(__mmask32 k, __m256i a,
                                     __m256i b, const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	tmp_dst[i+127:i] := tmp[127:0]
        ENDFOR
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_blend_epi8(__mmask32 k, __m256i a,
                                   __m256i b)

.. admonition:: Intel Description

    Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := b[i+7:i]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_blend_epi16(__mmask16 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := b[i+15:i]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_mask_broadcastb_epi8(__m256i src,
                                        __mmask32 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m256i _mm256_maskz_broadcastb_epi8(__mmask32 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_broadcastw_epi16(__m256i src,
                                         __mmask16 k,
                                         __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_broadcastw_epi16(__mmask16 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __mmask16 k, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    MASK k, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask2_permutex2var_epi16(__m256i a,
                                            __m256i idx,
                                            __mmask16 k,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+3:i]
        		dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := idx[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask16 k, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_permutex2var_epi16(__m256i a,
                                           __mmask16 k,
                                           __m256i idx,
                                           __m256i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+3:i]
        		dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_permutex2var_epi16(__mmask16 k,
                                            __m256i a,
                                            __m256i idx,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+3:i]
        		dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m256i _mm256_permutex2var_epi16(__m256i a, __m256i idx,
                                      __m256i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	off := 16*idx[i+3:i]
        	dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m256i _mm256_mask_permutexvar_epi16(__m256i src,
                                          __mmask16 k,
                                          __m256i idx,
                                          __m256i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	id := idx[i+3:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m256i _mm256_maskz_permutexvar_epi16(__mmask16 k,
                                           __m256i idx,
                                           __m256i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	id := idx[i+3:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m256i _mm256_permutexvar_epi16(__m256i idx, __m256i a);

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	id := idx[i+3:i]*16
        	dst[i+15:i] := a[id+15:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movepi8_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask32
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __mmask32 _mm256_movepi8_mask(__m256i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF a[i+7]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:32] := 0
        	

_mm256_movm_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_movm_epi8(__mmask32 k);

.. admonition:: Intel Description

    Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := 0xFF
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movm_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_movm_epi16(__mmask16 k);

.. admonition:: Intel Description

    Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 0xFFFF
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movepi16_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __mmask16 _mm256_movepi16_mask(__m256i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF a[i+15]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shufflehi_epi16(__m256i src,
                                        __mmask16 k, __m256i a,
                                        int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        tmp_dst[191:128] := a[191:128]
        tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shufflehi_epi16(__mmask16 k, __m256i a,
                                         int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        tmp_dst[191:128] := a[191:128]
        tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shufflelo_epi16(__m256i src,
                                        __mmask16 k, __m256i a,
                                        int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        tmp_dst[255:192] := a[255:192]
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shufflelo_epi16(__mmask16 k, __m256i a,
                                         int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        tmp_dst[255:192] := a[255:192]
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_unpackhi_epi8(__m256i src, __mmask32 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_unpackhi_epi8(__mmask32 k, __m256i a,
                                       __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_unpackhi_epi16(__m256i src, __mmask16 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_unpackhi_epi16(__mmask16 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_mask_unpacklo_epi8(__m256i src, __mmask32 k,
                                      __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_maskz_unpacklo_epi8(__mmask32 k, __m256i a,
                                       __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mask_unpacklo_epi16(__m256i src, __mmask16 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_maskz_unpacklo_epi16(__mmask16 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_packs_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mask_packs_epi32(__m256i src, __mmask16 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        tmp_dst[143:128] := Saturate16(a[159:128])
        tmp_dst[159:144] := Saturate16(a[191:160])
        tmp_dst[175:160] := Saturate16(a[223:192])
        tmp_dst[191:176] := Saturate16(a[255:224])
        tmp_dst[207:192] := Saturate16(b[159:128])
        tmp_dst[223:208] := Saturate16(b[191:160])
        tmp_dst[239:224] := Saturate16(b[223:192])
        tmp_dst[255:240] := Saturate16(b[255:224])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_packs_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_maskz_packs_epi32(__mmask16 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        tmp_dst[143:128] := Saturate16(a[159:128])
        tmp_dst[159:144] := Saturate16(a[191:160])
        tmp_dst[175:160] := Saturate16(a[223:192])
        tmp_dst[191:176] := Saturate16(a[255:224])
        tmp_dst[207:192] := Saturate16(b[159:128])
        tmp_dst[223:208] := Saturate16(b[191:160])
        tmp_dst[239:224] := Saturate16(b[223:192])
        tmp_dst[255:240] := Saturate16(b[255:224])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_packs_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_packs_epi16(__m256i src, __mmask32 k,
                                    __m256i a, __m256i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        tmp_dst[135:128] := Saturate8(a[143:128])
        tmp_dst[143:136] := Saturate8(a[159:144])
        tmp_dst[151:144] := Saturate8(a[175:160])
        tmp_dst[159:152] := Saturate8(a[191:176])
        tmp_dst[167:160] := Saturate8(a[207:192])
        tmp_dst[175:168] := Saturate8(a[223:208])
        tmp_dst[183:176] := Saturate8(a[239:224])
        tmp_dst[191:184] := Saturate8(a[255:240])
        tmp_dst[199:192] := Saturate8(b[143:128])
        tmp_dst[207:200] := Saturate8(b[159:144])
        tmp_dst[215:208] := Saturate8(b[175:160])
        tmp_dst[223:216] := Saturate8(b[191:176])
        tmp_dst[231:224] := Saturate8(b[207:192])
        tmp_dst[239:232] := Saturate8(b[223:208])
        tmp_dst[247:240] := Saturate8(b[239:224])
        tmp_dst[255:248] := Saturate8(b[255:240])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_packs_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_packs_epi16(__mmask32 k, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        tmp_dst[135:128] := Saturate8(a[143:128])
        tmp_dst[143:136] := Saturate8(a[159:144])
        tmp_dst[151:144] := Saturate8(a[175:160])
        tmp_dst[159:152] := Saturate8(a[191:176])
        tmp_dst[167:160] := Saturate8(a[207:192])
        tmp_dst[175:168] := Saturate8(a[223:208])
        tmp_dst[183:176] := Saturate8(a[239:224])
        tmp_dst[191:184] := Saturate8(a[255:240])
        tmp_dst[199:192] := Saturate8(b[143:128])
        tmp_dst[207:200] := Saturate8(b[159:144])
        tmp_dst[215:208] := Saturate8(b[175:160])
        tmp_dst[223:216] := Saturate8(b[191:176])
        tmp_dst[231:224] := Saturate8(b[207:192])
        tmp_dst[239:232] := Saturate8(b[223:208])
        tmp_dst[247:240] := Saturate8(b[239:224])
        tmp_dst[255:248] := Saturate8(b[255:240])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_packus_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mask_packus_epi32(__m256i src, __mmask16 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        tmp_dst[143:128] := SaturateU16(a[159:128])
        tmp_dst[159:144] := SaturateU16(a[191:160])
        tmp_dst[175:160] := SaturateU16(a[223:192])
        tmp_dst[191:176] := SaturateU16(a[255:224])
        tmp_dst[207:192] := SaturateU16(b[159:128])
        tmp_dst[223:208] := SaturateU16(b[191:160])
        tmp_dst[239:224] := SaturateU16(b[223:192])
        tmp_dst[255:240] := SaturateU16(b[255:224])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_packus_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_maskz_packus_epi32(__mmask16 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        tmp_dst[143:128] := SaturateU16(a[159:128])
        tmp_dst[159:144] := SaturateU16(a[191:160])
        tmp_dst[175:160] := SaturateU16(a[223:192])
        tmp_dst[191:176] := SaturateU16(a[255:224])
        tmp_dst[207:192] := SaturateU16(b[159:128])
        tmp_dst[223:208] := SaturateU16(b[191:160])
        tmp_dst[239:224] := SaturateU16(b[223:192])
        tmp_dst[255:240] := SaturateU16(b[255:224])
        FOR j := 0 to 15
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_packus_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mask_packus_epi16(__m256i src, __mmask32 k,
                                     __m256i a, __m256i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        tmp_dst[135:128] := SaturateU8(a[143:128])
        tmp_dst[143:136] := SaturateU8(a[159:144])
        tmp_dst[151:144] := SaturateU8(a[175:160])
        tmp_dst[159:152] := SaturateU8(a[191:176])
        tmp_dst[167:160] := SaturateU8(a[207:192])
        tmp_dst[175:168] := SaturateU8(a[223:208])
        tmp_dst[183:176] := SaturateU8(a[239:224])
        tmp_dst[191:184] := SaturateU8(a[255:240])
        tmp_dst[199:192] := SaturateU8(b[143:128])
        tmp_dst[207:200] := SaturateU8(b[159:144])
        tmp_dst[215:208] := SaturateU8(b[175:160])
        tmp_dst[223:216] := SaturateU8(b[191:176])
        tmp_dst[231:224] := SaturateU8(b[207:192])
        tmp_dst[239:232] := SaturateU8(b[223:208])
        tmp_dst[247:240] := SaturateU8(b[239:224])
        tmp_dst[255:248] := SaturateU8(b[255:240])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_packus_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask32 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_maskz_packus_epi16(__mmask32 k, __m256i a,
                                      __m256i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        tmp_dst[135:128] := SaturateU8(a[143:128])
        tmp_dst[143:136] := SaturateU8(a[159:144])
        tmp_dst[151:144] := SaturateU8(a[175:160])
        tmp_dst[159:152] := SaturateU8(a[191:176])
        tmp_dst[167:160] := SaturateU8(a[207:192])
        tmp_dst[175:168] := SaturateU8(a[223:208])
        tmp_dst[183:176] := SaturateU8(a[239:224])
        tmp_dst[191:184] := SaturateU8(a[255:240])
        tmp_dst[199:192] := SaturateU8(b[143:128])
        tmp_dst[207:200] := SaturateU8(b[159:144])
        tmp_dst[215:208] := SaturateU8(b[175:160])
        tmp_dst[223:216] := SaturateU8(b[191:176])
        tmp_dst[231:224] := SaturateU8(b[207:192])
        tmp_dst[239:232] := SaturateU8(b[223:208])
        tmp_dst[247:240] := SaturateU8(b[239:224])
        tmp_dst[255:248] := SaturateU8(b[255:240])
        FOR j := 0 to 31
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastmb_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_broadcastmb_epi64(__mmask8 k);

.. admonition:: Intel Description

    Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ZeroExtend64(k[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastmw_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_broadcastmw_epi32(__mmask16 k);

.. admonition:: Intel Description

    Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ZeroExtend32(k[15:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_broadcast_f32x2(__m128 a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_broadcast_f32x2(__m256 src, __mmask8 k,
                                       __m128 a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_f32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_broadcast_f32x2(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_broadcast_f64x2(__m128d a);

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_broadcast_f64x2(__m256d src, __mmask8 k,
                                        __m128d a)

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_broadcast_f64x2(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_broadcast_i32x2(__m128i a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_broadcast_i32x2(__m256i src, __mmask8 k,
                                        __m128i a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_broadcast_i32x2(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_broadcast_i64x2(__m128i a);

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	dst[i+63:i] := a[n+63:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_broadcast_i64x2(__m256i src, __mmask8 k,
                                        __m128i a)

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_broadcast_i64x2(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	n := (j % 2)*64
        	IF k[j]
        		dst[i+63:i] := a[n+63:n]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128d
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm256_extractf64x2_pd(__m256d a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_mask_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm256_mask_extractf64x2_pd(__m128d src, __mmask8 k,
                                        __m256d a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_extractf64x2_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm256_maskz_extractf64x2_pd(__mmask8 k, __m256d a,
                                         int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_extracti64x2_epi64(__m256i a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_mask_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_mask_extracti64x2_epi64(__m128i src,
                                           __mmask8 k,
                                           __m256i a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_extracti64x2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_maskz_extracti64x2_epi64(__mmask8 k,
                                            __m256i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_fpclass_pd_mask(__m256d a, int imm8);

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256d a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_fpclass_pd_mask(__mmask8 k1, __m256d a,
                                         int imm8)

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	IF k1[j]
        		k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_fpclass_ps_mask(__m256 a, int imm8);

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_mask_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm256_mask_fpclass_ps_mask(__mmask8 k1, __m256 a,
                                         int imm8)

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	IF k1[j]
        		k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_insertf64x2
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_insertf64x2(__m256d a, __m128d b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE imm8[0] OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_mask_insertf64x2
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_insertf64x2(__m256d src, __mmask8 k,
                                    __m256d a, __m128d b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_insertf64x2
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_insertf64x2(__mmask8 k, __m256d a,
                                     __m128d b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_inserti64x2
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_inserti64x2(__m256i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE imm8[0] OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_mask_inserti64x2
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_inserti64x2(__m256i src, __mmask8 k,
                                    __m256i a, __m128i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_inserti64x2
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_inserti64x2(__mmask8 k, __m256i a,
                                     __m128i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movepi32_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __mmask8 _mm256_movepi32_mask(__m256i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF a[i+31]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm256_movm_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_movm_epi32(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := 0xFFFFFFFF
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movm_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m256i _mm256_movm_epi64(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movepi64_mask
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask8
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __mmask8 _mm256_movepi64_mask(__m256i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF a[i+63]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm256_mask_range_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_range_pd(__m256d src, __mmask8 k,
                                 __m256d a, __m256d b,
                                 int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_range_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_range_pd(__mmask8 k, __m256d a,
                                  __m256d b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_range_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_range_pd(__m256d a, __m256d b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_range_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_range_ps(__m256 src, __mmask8 k,
                                __m256 a, __m256 b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_range_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_range_ps(__mmask8 k, __m256 a, __m256 b,
                                 int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_range_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_range_ps(__m256 a, __m256 b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_reduce_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_reduce_pd(__m256d src, __mmask8 k,
                                  __m256d a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_reduce_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_reduce_pd(__mmask8 k, __m256d a,
                                   int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_reduce_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_reduce_pd(__m256d a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_reduce_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_reduce_ps(__m256 src, __mmask8 k,
                                 __m256 a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_reduce_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_reduce_ps(__mmask8 k, __m256 a,
                                  int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_reduce_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_reduce_ps(__m256 a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	RETURN tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_alignr_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_alignr_epi32(__m256i a, __m256i b,
                                const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (32*imm8[2:0])
        dst[255:0] := temp[255:0]
        dst[MAX:256] := 0
        	

_mm256_mask_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_alignr_epi32(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b,
                                     const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (32*imm8[2:0])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_alignr_epi32(__mmask8 k, __m256i a,
                                      __m256i b,
                                      const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (32*imm8[2:0])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_alignr_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_alignr_epi64(__m256i a, __m256i b,
                                const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (64*imm8[1:0])
        dst[255:0] := temp[255:0]
        dst[MAX:256] := 0
        	

_mm256_mask_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_alignr_epi64(__m256i src, __mmask8 k,
                                     __m256i a, __m256i b,
                                     const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (64*imm8[1:0])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_alignr_epi64(__mmask8 k, __m256i a,
                                      __m256i b,
                                      const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[511:256] := a[255:0]
        temp[255:0] := b[255:0]
        temp[511:0] := temp[511:0] >> (64*imm8[1:0])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_blend_pd(__mmask8 k, __m256d a,
                                 __m256d b)

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_blend_ps(__mmask8 k, __m256 a, __m256 b);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_broadcast_f32x4(__m128 a);

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_broadcast_f32x4(__m256 src, __mmask8 k,
                                       __m128 a)

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_broadcast_f32x4(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_broadcast_i32x4(__m128i a);

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_broadcast_i32x4(__m256i src, __mmask8 k,
                                        __m128i a)

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcast_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_broadcast_i32x4(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	n := (j % 4)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_broadcastsd_pd(__m256d src, __mmask8 k,
                                       __m128d a)

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_broadcastsd_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_broadcastss_ps(__m256 src, __mmask8 k,
                                      __m128 a)

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_broadcastss_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_compress_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_compress_pd(__m256d src, __mmask8 k,
                                    __m256d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_compress_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_compress_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_compress_ps(__m256 src, __mmask8 k,
                                   __m256 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_compress_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_expand_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_expand_pd(__m256d src, __mmask8 k,
                                  __m256d a)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_expand_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expand_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_expand_ps(__m256 src, __mmask8 k,
                                 __m256 a)

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_expand_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm256_extractf32x4_ps(__m256 a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_mask_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm256_mask_extractf32x4_ps(__m128 src, __mmask8 k,
                                       __m256 a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_extractf32x4_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm256_maskz_extractf32x4_ps(__mmask8 k, __m256 a,
                                        int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_extracti32x4_epi32(__m256i a, int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_mask_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_mask_extracti32x4_epi32(__m128i src,
                                           __mmask8 k,
                                           __m256i a, int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_maskz_extracti32x4_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_maskz_extracti32x4_epi32(__mmask8 k,
                                            __m256i a,
                                            int imm8)

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: tmp[127:0] := a[127:0]
        1: tmp[127:0] := a[255:128]
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_fixupimm_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_fixupimm_pd(__m256d a, __m256d b, __m256i c,
                               int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN: j := 0
        	SNAN_TOKEN: j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256d b, 
    __m256i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_fixupimm_pd(__m256d a, __mmask8 k,
                                    __m256d b, __m256i c,
                                    int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    __m256i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_fixupimm_pd(__mmask8 k, __m256d a,
                                     __m256d b, __m256i c,
                                     int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fixupimm_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_fixupimm_ps(__m256 a, __m256 b, __m256i c,
                              int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256 b, 
    __m256i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_fixupimm_ps(__m256 a, __mmask8 k,
                                   __m256 b, __m256i c,
                                   int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    __m256i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_fixupimm_ps(__mmask8 k, __m256 a,
                                    __m256 b, __m256i c,
                                    int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_getexp_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_getexp_pd(__m256d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getexp_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_getexp_pd(__m256d src, __mmask8 k,
                                  __m256d a)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getexp_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_getexp_pd(__mmask8 k, __m256d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_getexp_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_getexp_ps(__m256 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getexp_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_getexp_ps(__m256 src, __mmask8 k,
                                 __m256 a)

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getexp_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_getexp_ps(__mmask8 k, __m256 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_getmant_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256d _mm256_getmant_pd(__m256d a,
                              _MM_MANTISSA_NORM_ENUM interv,
                              _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getmant_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256d _mm256_mask_getmant_pd(
        __m256d src, __mmask8 k, __m256d a,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getmant_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256d _mm256_maskz_getmant_pd(
        __mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_getmant_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256 _mm256_getmant_ps(__m256 a,
                             _MM_MANTISSA_NORM_ENUM interv,
                             _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getmant_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256 _mm256_mask_getmant_ps(__m256 src, __mmask8 k,
                                  __m256 a,
                                  _MM_MANTISSA_NORM_ENUM interv,
                                  _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getmant_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m256 _mm256_maskz_getmant_ps(
        __mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_insertf32x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_insertf32x4(__m256 a, __m128 b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_mask_insertf32x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_insertf32x4(__m256 src, __mmask8 k,
                                   __m256 a, __m128 b,
                                   int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_insertf32x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_insertf32x4(__mmask8 k, __m256 a,
                                    __m128 b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_inserti32x4
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_inserti32x4(__m256i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_mask_inserti32x4
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_inserti32x4(__m256i src, __mmask8 k,
                                    __m256i a, __m128i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_inserti32x4
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_inserti32x4(__mmask8 k, __m256i a,
                                     __m128i b, int imm8)

.. admonition:: Intel Description

    Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8".  Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: tmp[127:0] := b[127:0]
        1: tmp[255:128] := b[127:0]
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_blend_epi32(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_blend_epi64(__mmask8 k, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_broadcastd_epi32(__m256i src,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_broadcastd_epi32(__mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_broadcastq_epi64(__m256i src,
                                         __mmask8 k, __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_broadcastq_epi64(__mmask8 k,
                                          __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_compress_epi32(__m256i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_compress_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_compress_epi64(__m256i src, __mmask8 k,
                                       __m256i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := src[255:m]
        dst[MAX:256] := 0
        	

_mm256_maskz_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_compress_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[255:m] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_permutexvar_epi32(__m256i src,
                                          __mmask8 k,
                                          __m256i idx,
                                          __m256i a)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_permutexvar_epi32(__mmask8 k,
                                           __m256i idx,
                                           __m256i a)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutexvar_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI32 idx, 
    UI32 a

.. code-block:: C

    __m256i _mm256_permutexvar_epi32(__m256i idx, __m256i a);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __mmask8 k, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    MASK k, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask2_permutex2var_epi32(__m256i a,
                                            __m256i idx,
                                            __mmask8 k,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_permutex2var_epi32(__m256i a,
                                           __mmask8 k,
                                           __m256i idx,
                                           __m256i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_permutex2var_epi32(__mmask8 k,
                                            __m256i a,
                                            __m256i idx,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m256i _mm256_permutex2var_epi32(__m256i a, __m256i idx,
                                      __m256i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256i idx, 
    __mmask8 k, 
    __m256d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    MASK k, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask2_permutex2var_pd(__m256d a, __m256i idx,
                                         __mmask8 k, __m256d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __mmask8 k, 
    __m256i idx, 
    __m256d b
:Param ETypes:
    FP64 a, 
    MASK k, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_permutex2var_pd(__m256d a, __mmask8 k,
                                        __m256i idx, __m256d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256i idx, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_permutex2var_pd(__mmask8 k, __m256d a,
                                         __m256i idx,
                                         __m256d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256i idx, 
    __m256d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m256d _mm256_permutex2var_pd(__m256d a, __m256i idx,
                                   __m256d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256i idx, 
    __mmask8 k, 
    __m256 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    MASK k, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask2_permutex2var_ps(__m256 a, __m256i idx,
                                        __mmask8 k, __m256 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __mmask8 k, 
    __m256i idx, 
    __m256 b
:Param ETypes:
    FP32 a, 
    MASK k, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_permutex2var_ps(__m256 a, __mmask8 k,                                   __m256i idx, __m256 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256i idx, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_permutex2var_ps(__mmask8 k, __m256 a,
                                        __m256i idx, __m256 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256i idx, 
    __m256 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m256 _mm256_permutex2var_ps(__m256 a, __m256i idx,
                                  __m256 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	off := idx[i+2:i]*32
        	dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask2_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __mmask8 k, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    MASK k, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask2_permutex2var_epi64(__m256i a,
                                            __m256i idx,
                                            __mmask8 k,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __mmask8 k, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_permutex2var_epi64(__m256i a,
                                           __mmask8 k,
                                           __m256i idx,
                                           __m256i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_permutex2var_epi64(__mmask8 k,
                                            __m256i a,
                                            __m256i idx,
                                            __m256i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m256i _mm256_permutex2var_epi64(__m256i a, __m256i idx,
                                      __m256i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	off := idx[i+1:i]*64
        	dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permute_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_permute_pd(__m256d src, __mmask8 k,
                                   __m256d a, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256i b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m256d _mm256_mask_permutevar_pd(__m256d src, __mmask8 k,
                                      __m256d a, __m256i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permute_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_permute_pd(__mmask8 k, __m256d a,
                                    const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256i b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m256d _mm256_maskz_permutevar_pd(__mmask8 k, __m256d a,
                                       __m256i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permute_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_permute_ps(__m256 src, __mmask8 k,
                                  __m256 a, const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256i b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m256 _mm256_mask_permutevar_ps(__m256 src, __mmask8 k,
                                     __m256 a, __m256i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
        tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
        tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
        tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permute_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_permute_ps(__mmask8 k, __m256 a,
                                   const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256i b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m256 _mm256_maskz_permutevar_ps(__mmask8 k, __m256 a,
                                      __m256i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
        tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
        tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
        tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_permutex_pd(__m256d src, __mmask8 k,
                                    __m256d a, int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256i idx, 
    __m256d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m256d _mm256_mask_permutexvar_pd(__m256d src, __mmask8 k,
                                       __m256i idx, __m256d a)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_permutex_pd(__mmask8 k, __m256d a,
                                     int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256i idx, 
    __m256d a
:Param ETypes:
    MASK k, 
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m256d _mm256_maskz_permutexvar_pd(__mmask8 k, __m256i idx,
                                        __m256d a)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_permutex_pd(__m256d a, int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_permutexvar_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256i idx, 
    __m256d a
:Param ETypes:
    UI64 idx, 
    FP64 a

.. code-block:: C

    __m256d _mm256_permutexvar_pd(__m256i idx, __m256d a);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	dst[i+63:i] := a[id+63:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256i idx, 
    __m256 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m256 _mm256_mask_permutexvar_ps(__m256 src, __mmask8 k,
                                      __m256i idx, __m256 a)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256i idx, 
    __m256 a
:Param ETypes:
    MASK k, 
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m256 _mm256_maskz_permutexvar_ps(__mmask8 k, __m256i idx,
                                       __m256 a)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	IF k[j]
        		dst[i+31:i] := a[id+31:id]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutexvar_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256i idx, 
    __m256 a
:Param ETypes:
    UI32 idx, 
    FP32 a

.. code-block:: C

    __m256 _mm256_permutexvar_ps(__m256i idx, __m256 a);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_permutex_epi64(__m256i src, __mmask8 k,
                                       __m256i a,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_permutexvar_epi64(__m256i src,
                                          __mmask8 k,
                                          __m256i idx,
                                          __m256i a)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_permutex_epi64(__mmask8 k, __m256i a,
                                        const int imm8)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i idx, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_permutexvar_epi64(__mmask8 k,
                                           __m256i idx,
                                           __m256i a)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	IF k[j]
        		dst[i+63:i] := a[id+63:id]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutex_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_permutex_epi64(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_permutexvar_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i idx, 
    __m256i a
:Param ETypes:
    UI64 idx, 
    UI64 a

.. code-block:: C

    __m256i _mm256_permutexvar_epi64(__m256i idx, __m256i a);

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	id := idx[i+1:i]*64
        	dst[i+63:i] := a[id+63:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expand_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_mask_expand_epi32(__m256i src, __mmask8 k,
                                     __m256i a)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m256i _mm256_maskz_expand_epi32(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_expand_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_mask_expand_epi64(__m256i src, __mmask8 k,
                                     __m256i a)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_expand_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m256i _mm256_maskz_expand_epi64(__mmask8 k, __m256i a);

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shuffle_epi32(__m256i src, __mmask8 k,
                                      __m256i a,
                                      _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shuffle_epi32(__mmask8 k, __m256i a,
                                       _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_unpackhi_epi32(__m256i src, __mmask8 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_unpackhi_epi32(__mmask8 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_unpackhi_epi64(__m256i src, __mmask8 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_unpackhi_epi64(__mmask8 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mask_unpacklo_epi32(__m256i src, __mmask8 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_maskz_unpacklo_epi32(__mmask8 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_mask_unpacklo_epi64(__m256i src, __mmask8 k,
                                       __m256i a, __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_maskz_unpacklo_epi64(__mmask8 k, __m256i a,
                                        __m256i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_roundscale_pd(__m256d src, __mmask8 k,
                                      __m256d a, int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_roundscale_pd(__mmask8 k, __m256d a,
                                       int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_roundscale_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_roundscale_pd(__m256d a, int imm8);

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_roundscale_ps(__m256 src, __mmask8 k,
                                     __m256 a, int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_roundscale_ps(__mmask8 k, __m256 a,
                                      int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_roundscale_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_roundscale_ps(__m256 a, int imm8);

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_scalef_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_scalef_pd(__m256d src, __mmask8 k,
                                  __m256d a, __m256d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_scalef_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_scalef_pd(__mmask8 k, __m256d a,
                                   __m256d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_scalef_pd
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_scalef_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_scalef_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_scalef_ps(__m256 src, __mmask8 k,
                                 __m256 a, __m256 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_scalef_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_scalef_ps(__mmask8 k, __m256 a,
                                  __m256 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_scalef_ps
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_scalef_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_shuffle_f32x4(__m256 src, __mmask8 k,
                                     __m256 a, __m256 b,
                                     const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_shuffle_f32x4(__mmask8 k, __m256 a,
                                      __m256 b, const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_f32x4
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_shuffle_f32x4(__m256 a, __m256 b,
                                const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.m128[0] := a.m128[imm8[0]]
        dst.m128[1] := b.m128[imm8[1]]
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_shuffle_f64x2(__m256d src, __mmask8 k,
                                      __m256d a, __m256d b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_shuffle_f64x2(__mmask8 k, __m256d a,
                                       __m256d b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_f64x2
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_shuffle_f64x2(__m256d a, __m256d b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.m128[0] := a.m128[imm8[0]]
        dst.m128[1] := b.m128[imm8[1]]
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shuffle_i32x4(__m256i src, __mmask8 k,
                                      __m256i a, __m256i b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shuffle_i32x4(__mmask8 k, __m256i a,
                                       __m256i b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_i32x4
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shuffle_i32x4(__m256i a, __m256i b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.m128[0] := a.m128[imm8[0]]
        dst.m128[1] := b.m128[imm8[1]]
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mask_shuffle_i64x2(__m256i src, __mmask8 k,
                                      __m256i a, __m256i b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __mmask8 k, 
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_maskz_shuffle_i64x2(__mmask8 k, __m256i a,
                                       __m256i b,
                                       const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst.m128[0] := a.m128[imm8[0]]
        tmp_dst.m128[1] := b.m128[imm8[1]]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_i64x2
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shuffle_i64x2(__m256i a, __m256i b,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst.m128[0] := a.m128[imm8[0]]
        dst.m128[1] := b.m128[imm8[1]]
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_mask_shuffle_pd(__m256d src, __mmask8 k,
                                   __m256d a, __m256d b,
                                   const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_maskz_shuffle_pd(__mmask8 k, __m256d a,
                                    __m256d b, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_shuffle_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_mask_shuffle_ps(__m256 src, __mmask8 k,
                                  __m256 a, __m256 b,
                                  const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_shuffle_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_maskz_shuffle_ps(__mmask8 k, __m256 a,
                                   __m256 b, const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_unpackhi_pd(__m256d src, __mmask8 k,
                                    __m256d a, __m256d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_unpackhi_pd(__mmask8 k, __m256d a,
                                     __m256d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_unpackhi_ps(__m256 src, __mmask8 k,
                                   __m256 a, __m256 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_unpackhi_ps(__mmask8 k, __m256 a,
                                    __m256 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mask_unpacklo_pd(__m256d src, __mmask8 k,
                                    __m256d a, __m256d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __mmask8 k, 
    __m256d a, 
    __m256d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_maskz_unpacklo_pd(__mmask8 k, __m256d a,
                                     __m256d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        FOR j := 0 to 3
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mask_unpacklo_ps(__m256 src, __mmask8 k,
                                   __m256 a, __m256 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __mmask8 k, 
    __m256 a, 
    __m256 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_maskz_unpacklo_ps(__mmask8 k, __m256 a,
                                    __m256 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        FOR j := 0 to 7
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_roundscale_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_roundscale_ph(__m256h a, int imm8);

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 15
        	dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        ENDFOR
        dest[MAX:256] := 0
        	

_mm256_mask_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_mask_roundscale_ph(__m256h src, __mmask16 k,
                                      __m256h a, int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dest[MAX:256] := 0
        	

_mm256_maskz_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_maskz_roundscale_ph(__mmask16 k, __m256h a,
                                       int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dest[MAX:256] := 0
        	

_mm256_getexp_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256h _mm256_getexp_ph(__m256h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 15
        	dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getexp_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_mask_getexp_ph(__m256h src, __mmask16 k,
                                  __m256h a)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getexp_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m256h _mm256_maskz_getexp_ph(__mmask16 k, __m256h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_getmant_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m256h _mm256_getmant_ph(__m256h a,
                              _MM_MANTISSA_NORM_ENUM norm,
                              _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    		[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 15
        	dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_getmant_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m256h _mm256_mask_getmant_ph(__m256h src, __mmask16 k,
                                   __m256h a,
                                   _MM_MANTISSA_NORM_ENUM norm,
                                   _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    		[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 15
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_getmant_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m256h _mm256_maskz_getmant_ph(
        __mmask16 k, __m256h a, _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    		[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 15
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_reduce_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_reduce_ph(__m256h a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 15
        	dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_reduce_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_mask_reduce_ph(__m256h src, __mmask16 k,
                                  __m256h a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_reduce_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m256h _mm256_maskz_reduce_ph(__mmask16 k, __m256h a,
                                   int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_scalef_ph
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_scalef_ph(__m256h a, __m256h b);

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_scalef_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h src, 
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_scalef_ph(__m256h src, __mmask16 k,
                                  __m256h a, __m256h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskz_scalef_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_maskz_scalef_ph(__mmask16 k, __m256h a,
                                   __m256h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 15
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __m256h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_fpclass_ph_mask(__m256h a, int imm8);

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    			[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 15
        	k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_mask_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __mmask16
:Param Types:
    __mmask16 k1, 
    __m256h a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask16 _mm256_mask_fpclass_ph_mask(__mmask16 k1,
                                          __m256h a, int imm8)

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    		[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 15
        	IF k1[i]
        		k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        	ELSE
        		k[i] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm256_permutex2var_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256h a, 
    __m256i idx, 
    __m256h b
:Param ETypes:
    FP16 a, 
    UI16 idx, 
    FP16 b

.. code-block:: C

    __m256h _mm256_permutex2var_ph(__m256h a, __m256i idx,
                                   __m256h b)

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	off := idx[i+3:i]
        	dst.fp16[j] := idx[i+4] ? b.fp16[off] : a.fp16[off]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_blend_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __mmask16 k, 
    __m256h a, 
    __m256h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m256h _mm256_mask_blend_ph(__mmask16 k, __m256h a,
                                 __m256h b)

.. admonition:: Intel Description

    Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	IF k[j]
        		dst.fp16[j] := b.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutexvar_ph
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256h
:Param Types:
    __m256i idx, 
    __m256h a
:Param ETypes:
    UI16 idx, 
    FP16 a

.. code-block:: C

    __m256h _mm256_permutexvar_ph(__m256i idx, __m256h a);

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	id := idx[i+3:i]
        	dst.fp16[j] := a.fp16[id]
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_dbsad_epu8
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_dbsad_epu8(__m128i a, __m128i b, int imm8);

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp.dword[0] := b.dword[ imm8[1:0] ]
        tmp.dword[1] := b.dword[ imm8[3:2] ]
        tmp.dword[2] := b.dword[ imm8[5:4] ]
        tmp.dword[3] := b.dword[ imm8[7:6] ]
        FOR j := 0 to 1
        	i := j*64
        	dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	               ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                  ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                  ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_dbsad_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_dbsad_epu8(__m128i src, __mmask8 k,
                                __m128i a, __m128i b, int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp.dword[0] := b.dword[ imm8[1:0] ]
        tmp.dword[1] := b.dword[ imm8[3:2] ]
        tmp.dword[2] := b.dword[ imm8[5:4] ]
        tmp.dword[3] := b.dword[ imm8[7:6] ]
        FOR j := 0 to 1
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_dbsad_epu8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_dbsad_epu8(__mmask8 k, __m128i a,
                                 __m128i b, int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp.dword[0] := b.dword[ imm8[1:0] ]
        tmp.dword[1] := b.dword[ imm8[3:2] ]
        tmp.dword[2] := b.dword[ imm8[5:4] ]
        tmp.dword[3] := b.dword[ imm8[7:6] ]
        FOR j := 0 to 1
        	i := j*64
        	tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
        	                   ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
        	
        	tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
        	                      ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
        	
        	tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
        	
        	tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
        	                      ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
        ENDFOR
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_alignr_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_alignr_epi8(__m128i src, __mmask16 k,
                                 __m128i a, __m128i b,
                                 const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_alignr_epi8
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_alignr_epi8(__mmask16 k, __m128i a,
                                  __m128i b, const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_blend_epi8(__mmask16 k, __m128i a,
                                __m128i b)

.. admonition:: Intel Description

    Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := b[i+7:i]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_blend_epi16(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := b[i+15:i]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_mask_broadcastb_epi8(__m128i src, __mmask16 k,
                                     __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI8 a

.. code-block:: C

    __m128i _mm_maskz_broadcastb_epi8(__mmask16 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := a[7:0]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_broadcastw_epi16(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_broadcastw_epi16(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := a[15:0]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __mmask8 k, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    MASK k, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask2_permutex2var_epi16(__m128i a, __m128i idx,
                                         __mmask8 k, __m128i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+2:i]
        		dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := idx[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI16 a, 
    MASK k, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_permutex2var_epi16(__m128i a, __mmask8 k,
                                        __m128i idx, __m128i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+2:i]
        		dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_permutex2var_epi16(__mmask8 k, __m128i a,
                                         __m128i idx,
                                         __m128i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		off := 16*idx[i+2:i]
        		dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI16 a, 
    UI16 idx, 
    UI16 b

.. code-block:: C

    __m128i _mm_permutex2var_epi16(__m128i a, __m128i idx,
                                   __m128i b)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	off := 16*idx[i+2:i]
        	dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i idx, 
    __m128i a
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m128i _mm_mask_permutexvar_epi16(__m128i src, __mmask8 k,
                                       __m128i idx, __m128i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	id := idx[i+2:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i idx, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m128i _mm_maskz_permutexvar_epi16(__mmask8 k, __m128i idx,
                                        __m128i a)

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	id := idx[i+2:i]*16
        	IF k[j]
        		dst[i+15:i] := a[id+15:id]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutexvar_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i idx, 
    __m128i a
:Param ETypes:
    UI16 idx, 
    UI16 a

.. code-block:: C

    __m128i _mm_permutexvar_epi16(__m128i idx, __m128i a);

.. admonition:: Intel Description

    Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	id := idx[i+2:i]*16
        	dst[i+15:i] := a[id+15:id]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_movepi8_mask
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask16
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __mmask16 _mm_movepi8_mask(__m128i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF a[i+7]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:16] := 0
        	

_mm_movm_epi8
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_movm_epi8(__mmask16 k);

.. admonition:: Intel Description

    Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := 0xFF
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_movm_epi16
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_movm_epi16(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := 0xFFFF
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_movepi16_mask
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __mmask8 _mm_movepi16_mask(__m128i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	IF a[i+15]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shufflehi_epi16(__m128i src, __mmask8 k,
                                     __m128i a, int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shufflehi_epi16(__mmask8 k, __m128i a,
                                      int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := a[63:0]
        tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shufflelo_epi16(__m128i src, __mmask8 k,
                                     __m128i a, int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    int imm8
:Param ETypes:
    MASK k, 
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shufflelo_epi16(__mmask8 k, __m128i a,
                                      int imm8)

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        tmp_dst[127:64] := a[127:64]
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_unpackhi_epi8(__m128i src, __mmask16 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_unpackhi_epi8(__mmask16 k, __m128i a,
                                    __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_unpackhi_epi16(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_unpackhi_epi16(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_mask_unpacklo_epi8(__m128i src, __mmask16 k,
                                   __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI8 a, 
    UI8 b

.. code-block:: C

    __m128i _mm_maskz_unpacklo_epi8(__mmask16 k, __m128i a,
                                    __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_mask_unpacklo_epi16(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI16 a, 
    UI16 b

.. code-block:: C

    __m128i _mm_maskz_unpacklo_epi16(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_packs_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mask_packs_epi32(__m128i src, __mmask8 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_packs_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_maskz_packs_epi32(__mmask8 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := Saturate16(a[31:0])
        tmp_dst[31:16] := Saturate16(a[63:32])
        tmp_dst[47:32] := Saturate16(a[95:64])
        tmp_dst[63:48] := Saturate16(a[127:96])
        tmp_dst[79:64] := Saturate16(b[31:0])
        tmp_dst[95:80] := Saturate16(b[63:32])
        tmp_dst[111:96] := Saturate16(b[95:64])
        tmp_dst[127:112] := Saturate16(b[127:96])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_packs_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_packs_epi16(__m128i src, __mmask16 k,
                                 __m128i a, __m128i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_packs_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_packs_epi16(__mmask16 k, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := Saturate8(a[15:0])
        tmp_dst[15:8] := Saturate8(a[31:16])
        tmp_dst[23:16] := Saturate8(a[47:32])
        tmp_dst[31:24] := Saturate8(a[63:48])
        tmp_dst[39:32] := Saturate8(a[79:64])
        tmp_dst[47:40] := Saturate8(a[95:80])
        tmp_dst[55:48] := Saturate8(a[111:96])
        tmp_dst[63:56] := Saturate8(a[127:112])
        tmp_dst[71:64] := Saturate8(b[15:0])
        tmp_dst[79:72] := Saturate8(b[31:16])
        tmp_dst[87:80] := Saturate8(b[47:32])
        tmp_dst[95:88] := Saturate8(b[63:48])
        tmp_dst[103:96] := Saturate8(b[79:64])
        tmp_dst[111:104] := Saturate8(b[95:80])
        tmp_dst[119:112] := Saturate8(b[111:96])
        tmp_dst[127:120] := Saturate8(b[127:112])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_packus_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI16 src, 
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_mask_packus_epi32(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := src[i+15:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_packus_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI32 a, 
    SI32 b

.. code-block:: C

    __m128i _mm_maskz_packus_epi32(__mmask8 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[15:0] := SaturateU16(a[31:0])
        tmp_dst[31:16] := SaturateU16(a[63:32])
        tmp_dst[47:32] := SaturateU16(a[95:64])
        tmp_dst[63:48] := SaturateU16(a[127:96])
        tmp_dst[79:64] := SaturateU16(b[31:0])
        tmp_dst[95:80] := SaturateU16(b[63:32])
        tmp_dst[111:96] := SaturateU16(b[95:64])
        tmp_dst[127:112] := SaturateU16(b[127:96])
        FOR j := 0 to 7
        	i := j*16
        	IF k[j]
        		dst[i+15:i] := tmp_dst[i+15:i]
        	ELSE
        		dst[i+15:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_packus_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI8 src, 
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_mask_packus_epi16(__m128i src, __mmask16 k,
                                  __m128i a, __m128i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := src[i+7:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_packus_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_maskz_packus_epi16(__mmask16 k, __m128i a,
                                   __m128i b)

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[7:0] := SaturateU8(a[15:0])
        tmp_dst[15:8] := SaturateU8(a[31:16])
        tmp_dst[23:16] := SaturateU8(a[47:32])
        tmp_dst[31:24] := SaturateU8(a[63:48])
        tmp_dst[39:32] := SaturateU8(a[79:64])
        tmp_dst[47:40] := SaturateU8(a[95:80])
        tmp_dst[55:48] := SaturateU8(a[111:96])
        tmp_dst[63:56] := SaturateU8(a[127:112])
        tmp_dst[71:64] := SaturateU8(b[15:0])
        tmp_dst[79:72] := SaturateU8(b[31:16])
        tmp_dst[87:80] := SaturateU8(b[47:32])
        tmp_dst[95:88] := SaturateU8(b[63:48])
        tmp_dst[103:96] := SaturateU8(b[79:64])
        tmp_dst[111:104] := SaturateU8(b[95:80])
        tmp_dst[119:112] := SaturateU8(b[111:96])
        tmp_dst[127:120] := SaturateU8(b[127:112])
        FOR j := 0 to 15
        	i := j*8
        	IF k[j]
        		dst[i+7:i] := tmp_dst[i+7:i]
        	ELSE
        		dst[i+7:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastmb_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_broadcastmb_epi64(__mmask8 k);

.. admonition:: Intel Description

    Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ZeroExtend64(k[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastmw_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask16 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_broadcastmw_epi32(__mmask16 k);

.. admonition:: Intel Description

    Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ZeroExtend32(k[15:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_broadcast_i32x2(__m128i a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	n := (j % 2)*32
        	dst[i+31:i] := a[n+31:n]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_broadcast_i32x2(__m128i src, __mmask8 k,
                                     __m128i a)

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcast_i32x2
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_broadcast_i32x2(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	n := (j % 2)*32
        	IF k[j]
        		dst[i+31:i] := a[n+31:n]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_pd_mask(__m128d a, int imm8);

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_fpclass_pd_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128d a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_pd_mask(__mmask8 k1, __m128d a,
                                      int imm8)

.. admonition:: Intel Description

    Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	IF k1[j]
        		k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_ps_mask(__m128 a, int imm8);

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        ENDFOR
        k[MAX:4] := 0
        	

_mm_mask_fpclass_ps_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_ps_mask(__mmask8 k1, __m128 a,
                                      int imm8)

.. admonition:: Intel Description

    Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	IF k1[j]
        		k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_movepi32_mask
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __mmask8 _mm_movepi32_mask(__m128i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF a[i+31]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:4] := 0
        	

_mm_movm_epi32
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_movm_epi32(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := 0xFFFFFFFF
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_movm_epi64
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k
:Param ETypes:
    MASK k

.. code-block:: C

    __m128i _mm_movm_epi64(__mmask8 k);

.. admonition:: Intel Description

    Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_movepi64_mask
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __mmask8 _mm_movepi64_mask(__m128i a);

.. admonition:: Intel Description

    Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF a[i+63]
        		k[j] := 1
        	ELSE
        		k[j] := 0
        	FI
        ENDFOR
        k[MAX:2] := 0
        	

_mm_mask_range_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_range_pd(__m128d src, __mmask8 k,
                              __m128d a, __m128d b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_range_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_range_pd(__mmask8 k, __m128d a, __m128d b,
                               int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_range_pd
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_range_pd(__m128d a, __m128d b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_range_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_range_ps(__m128 src, __mmask8 k, __m128 a,
                             __m128 b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_range_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_range_ps(__mmask8 k, __m128 a, __m128 b,
                              int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_range_ps
^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_range_ps(__m128 a, __m128 b, int imm8);

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[63:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_reduce_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_reduce_pd(__m128d src, __mmask8 k,
                               __m128d a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_reduce_pd(__mmask8 k, __m128d a,
                                int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_reduce_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_reduce_pd(__m128d a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_reduce_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_reduce_ps(__m128 src, __mmask8 k, __m128 a,
                              int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_reduce_ps(__mmask8 k, __m128 a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_reduce_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_reduce_ps(__m128 a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fpclass_sd_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_sd_mask(__m128d a, int imm8);

.. admonition:: Intel Description

    Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k".
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0])
        k[MAX:1] := 0
        	

_mm_mask_fpclass_sd_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128d a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_sd_mask(__mmask8 k1, __m128d a,
                                      int imm8)

.. admonition:: Intel Description

    Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k1[0]
        	k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0])
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_fpclass_ss_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_ss_mask(__m128 a, int imm8);

.. admonition:: Intel Description

    Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k.
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0])
        k[MAX:1] := 0
        	

_mm_mask_fpclass_ss_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_ss_mask(__mmask8 k1, __m128 a,
                                      int imm8)

.. admonition:: Intel Description

    Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
    	[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k1[0]
        	k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0])
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

_mm_mask_range_round_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_range_round_sd(__m128d src, __mmask8 k,
                                    __m128d a, __m128d b,
                                    int imm8, int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_range_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_range_sd(__m128d src, __mmask8 k,
                              __m128d a, __m128d b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_range_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_range_round_sd(__mmask8 k, __m128d a,
                                     __m128d b, int imm8,
                                     int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_range_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_range_sd(__mmask8 k, __m128d a, __m128d b,
                               int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_range_round_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_range_round_sd(__m128d a, __m128d b, int imm8,
                               int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
        	1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
        	2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
        	3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
        	1: dst[63:0] := tmp[63:0]
        	2: dst[63:0] := (0 << 63) OR (tmp[62:0])
        	3: dst[63:0] := (1 << 63) OR (tmp[62:0])
        	ESAC
        	
        	RETURN dst
        }
        dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_range_round_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_range_round_ss(__m128 src, __mmask8 k,
                                   __m128 a, __m128 b, int imm8,
                                   int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[31:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_range_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_range_ss(__m128 src, __mmask8 k, __m128 a,
                             __m128 b, int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[31:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_range_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_range_round_ss(__mmask8 k, __m128 a,
                                    __m128 b, int imm8,
                                    int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[31:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_range_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_range_ss(__mmask8 k, __m128 a, __m128 b,
                              int imm8)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[31:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        IF k[0]
        	dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_range_round_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_range_round_ss(__m128 a, __m128 b, int imm8,
                              int sae)

.. admonition:: Intel Description

    Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
    	imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
        	CASE opCtl[1:0] OF
        	0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
        	1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
        	2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
        	3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
        	ESAC
        	
        	CASE signSelCtl[1:0] OF
        	0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
        	1: dst[31:0] := tmp[31:0]
        	2: dst[31:0] := (0 << 31) OR (tmp[30:0])
        	3: dst[31:0] := (1 << 31) OR (tmp[30:0])
        	ESAC
        	
        	RETURN dst
        }
        dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_reduce_sd(__m128d src, __mmask8 k,
                               __m128d a, __m128d b, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_reduce_round_sd(__m128d src, __mmask8 k,
                                     __m128d a, __m128d b,
                                     int imm8, int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_reduce_sd(__mmask8 k, __m128d a,
                                __m128d b, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_reduce_round_sd(__mmask8 k, __m128d a,
                                      __m128d b, int imm8,
                                      int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_reduce_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_reduce_sd(__m128d a, __m128d b, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_reduce_round_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_reduce_round_sd(__m128d a, __m128d b, int imm8,
                                int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	tmp[63:0] := src1[63:0] - tmp[63:0]
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := FP64(0.0)
        	FI
        	RETURN tmp[63:0]
        }
        dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_reduce_ss(__m128 src, __mmask8 k, __m128 a,
                              __m128 b, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_reduce_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_reduce_round_ss(__m128 src, __mmask8 k,
                                    __m128 a, __m128 b,
                                    int imm8, int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_reduce_ss(__mmask8 k, __m128 a, __m128 b,
                               int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_reduce_round_ss(__mmask8 k, __m128 a,
                                     __m128 b, int imm8,
                                     int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_reduce_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_reduce_ss(__m128 a, __m128 b, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_reduce_round_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_reduce_round_ss(__m128 a, __m128 b, int imm8,
                               int sae)

.. admonition:: Intel Description

    Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	tmp[31:0] := src1[31:0] - tmp[31:0]
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := FP32(0.0)
        	FI
        	RETURN tmp[31:0]
        }
        dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_alignr_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_alignr_epi32(__m128i a, __m128i b,
                             const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (32*imm8[1:0])
        dst[127:0] := temp[127:0]
        dst[MAX:128] := 0
        	

_mm_mask_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_alignr_epi32(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b,
                                  const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (32*imm8[1:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_alignr_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_alignr_epi32(__mmask8 k, __m128i a,
                                   __m128i b, const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (32*imm8[1:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := temp[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_alignr_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_alignr_epi64(__m128i a, __m128i b,
                             const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (64*imm8[0])
        dst[127:0] := temp[127:0]
        dst[MAX:128] := 0
        	

_mm_mask_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_alignr_epi64(__m128i src, __mmask8 k,
                                  __m128i a, __m128i b,
                                  const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (64*imm8[0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_alignr_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_alignr_epi64(__mmask8 k, __m128i a,
                                   __m128i b, const int imm8)

.. admonition:: Intel Description

    Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        temp[255:128] := a[127:0]
        temp[127:0] := b[127:0]
        temp[255:0] := temp[255:0] >> (64*imm8[0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := temp[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_blend_pd(__mmask8 k, __m128d a, __m128d b);

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_blend_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_broadcastss_ps(__m128 src, __mmask8 k,
                                   __m128 a)

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_broadcastss_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_compress_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_compress_pd(__m128d src, __mmask8 k,
                                 __m128d a)

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_maskz_compress_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_compress_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask_compress_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_compress_ps(__m128 src, __mmask8 k,
                                __m128 a)

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_maskz_compress_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_compress_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask_expand_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_expand_pd(__m128d src, __mmask8 k,
                               __m128d a)

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_expand_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expand_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_expand_ps(__m128 src, __mmask8 k, __m128 a);

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_expand_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fixupimm_pd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_fixupimm_pd(__m128d a, __m128d b, __m128i c,
                            int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_fixupimm_pd(__m128d a, __mmask8 k,
                                 __m128d b, __m128i c,
                                 int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_fixupimm_pd(__mmask8 k, __m128d a,
                                  __m128d b, __m128i c,
                                  int imm8)

.. admonition:: Intel Description

    Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fixupimm_ps
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_fixupimm_ps(__m128 a, __m128 b, __m128i c,
                           int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_fixupimm_ps(__m128 a, __mmask8 k, __m128 b,
                                __m128i c, int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_fixupimm_ps(__mmask8 k, __m128 a, __m128 b,
                                 __m128i c, int imm8)

.. admonition:: Intel Description

    Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_getexp_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_getexp_pd(__m128d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getexp_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_mask_getexp_pd(__m128d src, __mmask8 k,
                               __m128d a)

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a
:Param ETypes:
    MASK k, 
    FP64 a

.. code-block:: C

    __m128d _mm_maskz_getexp_pd(__mmask8 k, __m128d a);

.. admonition:: Intel Description

    Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := ConvertExpFP64(a[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_getexp_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_getexp_ps(__m128 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getexp_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_mask_getexp_ps(__m128 src, __mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a
:Param ETypes:
    MASK k, 
    FP32 a

.. code-block:: C

    __m128 _mm_maskz_getexp_ps(__mmask8 k, __m128 a);

.. admonition:: Intel Description

    Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := ConvertExpFP32(a[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_getmant_pd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_getmant_pd(__m128d a,
                           _MM_MANTISSA_NORM_ENUM interv,
                           _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getmant_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_mask_getmant_pd(__m128d src, __mmask8 k,
                                __m128d a,
                                _MM_MANTISSA_NORM_ENUM interv,
                                _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_maskz_getmant_pd(__mmask8 k, __m128d a,
                                 _MM_MANTISSA_NORM_ENUM interv,
                                 _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_getmant_ps
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_getmant_ps(__m128 a,
                          _MM_MANTISSA_NORM_ENUM interv,
                          _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getmant_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_mask_getmant_ps(__m128 src, __mmask8 k, __m128 a,
                               _MM_MANTISSA_NORM_ENUM interv,
                               _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_maskz_getmant_ps(__mmask8 k, __m128 a,
                                _MM_MANTISSA_NORM_ENUM interv,
                                _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_blend_epi32(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_blend_epi64(__mmask8 k, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_broadcastd_epi32(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_broadcastd_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[31:0]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_broadcastq_epi64(__m128i src, __mmask8 k,
                                      __m128i a)

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_broadcastq_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[63:0]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_compress_epi32(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_maskz_compress_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_compress_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 32
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[m+size-1:m] := a[i+31:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_compress_epi64(__m128i src, __mmask8 k,
                                    __m128i a)

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := src[127:m]
        dst[MAX:128] := 0
        	

_mm_maskz_compress_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_compress_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        size := 64
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[m+size-1:m] := a[i+63:i]
        		m := m + size
        	FI
        ENDFOR
        dst[127:m] := 0
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __mmask8 k, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    MASK k, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask2_permutex2var_epi32(__m128i a, __m128i idx,
                                         __mmask8 k, __m128i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI32 a, 
    MASK k, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_permutex2var_epi32(__m128i a, __mmask8 k,
                                        __m128i idx, __m128i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_permutex2var_epi32(__mmask8 k, __m128i a,
                                         __m128i idx,
                                         __m128i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI32 a, 
    UI32 idx, 
    UI32 b

.. code-block:: C

    __m128i _mm_permutex2var_epi32(__m128i a, __m128i idx,
                                   __m128i b)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128i idx, 
    __mmask8 k, 
    __m128d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    MASK k, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask2_permutex2var_pd(__m128d a, __m128i idx,
                                      __mmask8 k, __m128d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set)

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128i idx, 
    __m128d b
:Param ETypes:
    FP64 a, 
    MASK k, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_permutex2var_pd(__m128d a, __mmask8 k,
                                     __m128i idx, __m128d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_pd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128i idx, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_permutex2var_pd(__mmask8 k, __m128d a,
                                      __m128i idx, __m128d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128i idx, 
    __m128d b
:Param ETypes:
    FP64 a, 
    UI64 idx, 
    FP64 b

.. code-block:: C

    __m128d _mm_permutex2var_pd(__m128d a, __m128i idx,
                                __m128d b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128i idx, 
    __mmask8 k, 
    __m128 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    MASK k, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask2_permutex2var_ps(__m128 a, __m128i idx,
                                     __mmask8 k, __m128 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := idx[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128i idx, 
    __m128 b
:Param ETypes:
    FP32 a, 
    MASK k, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_permutex2var_ps(__m128 a, __mmask8 k,
                                    __m128i idx, __m128 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_ps
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128i idx, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_permutex2var_ps(__mmask8 k, __m128 a,
                                     __m128i idx, __m128 b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	IF k[j]
        		dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128i idx, 
    __m128 b
:Param ETypes:
    FP32 a, 
    UI32 idx, 
    FP32 b

.. code-block:: C

    __m128 _mm_permutex2var_ps(__m128 a, __m128i idx, __m128 b);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	off := idx[i+1:i]*32
        	dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask2_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __mmask8 k, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    MASK k, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask2_permutex2var_epi64(__m128i a, __m128i idx,
                                         __mmask8 k, __m128i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := idx[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __mmask8 k, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI64 a, 
    MASK k, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_permutex2var_epi64(__m128i a, __mmask8 k,
                                        __m128i idx, __m128i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_permutex2var_epi64(__mmask8 k, __m128i a,
                                         __m128i idx,
                                         __m128i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	IF k[j]
        		dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutex2var_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i idx, 
    __m128i b
:Param ETypes:
    UI64 a, 
    UI64 idx, 
    UI64 b

.. code-block:: C

    __m128i _mm_permutex2var_epi64(__m128i a, __m128i idx,
                                   __m128i b)

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	off := idx[i]*64
        	dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permute_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_permute_pd(__m128d src, __mmask8 k,
                                __m128d a, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128i b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m128d _mm_mask_permutevar_pd(__m128d src, __mmask8 k,
                                   __m128d a, __m128i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permute_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_permute_pd(__mmask8 k, __m128d a,
                                 const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutevar_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128i b
:Param ETypes:
    MASK k, 
    FP64 a, 
    UI64 b

.. code-block:: C

    __m128d _mm_maskz_permutevar_pd(__mmask8 k, __m128d a,
                                    __m128i b)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permute_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_permute_ps(__m128 src, __mmask8 k, __m128 a,
                               const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128i b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m128 _mm_mask_permutevar_ps(__m128 src, __mmask8 k,
                                  __m128 a, __m128i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permute_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_permute_ps(__mmask8 k, __m128 a,
                                const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_permutevar_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128i b
:Param ETypes:
    MASK k, 
    FP32 a, 
    UI32 b

.. code-block:: C

    __m128 _mm_maskz_permutevar_ps(__mmask8 k, __m128 a,
                                   __m128i b)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
        tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
        tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expand_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_mask_expand_epi32(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI32 a

.. code-block:: C

    __m128i _mm_maskz_expand_epi32(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := a[m+31:m]
        		m := m + 32
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_expand_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_mask_expand_epi64(__m128i src, __mmask8 k,
                                  __m128i a)

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_expand_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a
:Param ETypes:
    MASK k, 
    UI64 a

.. code-block:: C

    __m128i _mm_maskz_expand_epi64(__mmask8 k, __m128i a);

.. admonition:: Intel Description

    Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        m := 0
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := a[m+63:m]
        		m := m + 64
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_mask_shuffle_epi32(__m128i src, __mmask8 k,
                                   __m128i a,
                                   _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    _MM_PERM_ENUM imm8
:Param ETypes:
    MASK k, 
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_maskz_shuffle_epi32(__mmask8 k, __m128i a,
                                    _MM_PERM_ENUM imm8)

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_unpackhi_epi32(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_unpackhi_epi32(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_unpackhi_epi64(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_unpackhi_epi64(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI32 src, 
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_mask_unpacklo_epi32(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI32 a, 
    UI32 b

.. code-block:: C

    __m128i _mm_maskz_unpacklo_epi32(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    UI64 src, 
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_mask_unpacklo_epi64(__m128i src, __mmask8 k,
                                    __m128i a, __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __mmask8 k, 
    __m128i a, 
    __m128i b
:Param ETypes:
    MASK k, 
    UI64 a, 
    UI64 b

.. code-block:: C

    __m128i _mm_maskz_unpacklo_epi64(__mmask8 k, __m128i a,
                                     __m128i b)

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_roundscale_pd(__m128d src, __mmask8 k,
                                   __m128d a, int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_pd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_roundscale_pd(__mmask8 k, __m128d a,
                                    int imm8)

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_roundscale_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_roundscale_pd(__m128d a, int imm8);

.. admonition:: Intel Description

    Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_roundscale_ps(__m128 src, __mmask8 k,
                                  __m128 a, int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_ps
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_roundscale_ps(__mmask8 k, __m128 a,
                                   int imm8)

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_roundscale_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_roundscale_ps(__m128 a, int imm8);

.. admonition:: Intel Description

    Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_scalef_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_scalef_pd(__m128d src, __mmask8 k,
                               __m128d a, __m128d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_scalef_pd(__mmask8 k, __m128d a,
                                __m128d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_scalef_pd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_scalef_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_scalef_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_scalef_ps(__m128 src, __mmask8 k, __m128 a,
                              __m128 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_scalef_ps(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_scalef_ps
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_scalef_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[31:0]
        }
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shuffle_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_shuffle_pd(__m128d src, __mmask8 k,
                                __m128d a, __m128d b,
                                const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shuffle_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_shuffle_pd(__mmask8 k, __m128d a,
                                 __m128d b, const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_shuffle_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_shuffle_ps(__m128 src, __mmask8 k, __m128 a,
                               __m128 b, const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_shuffle_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_shuffle_ps(__mmask8 k, __m128 a, __m128 b,
                                const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_unpackhi_pd(__m128d src, __mmask8 k,
                                 __m128d a, __m128d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_unpackhi_pd(__mmask8 k, __m128d a,
                                  __m128d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_unpackhi_ps(__m128 src, __mmask8 k,
                                __m128 a, __m128 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpackhi_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_unpackhi_ps(__mmask8 k, __m128 a,
                                 __m128 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_unpacklo_pd(__m128d src, __mmask8 k,
                                 __m128d a, __m128d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_unpacklo_pd(__mmask8 k, __m128d a,
                                  __m128d b)

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        FOR j := 0 to 1
        	i := j*64
        	IF k[j]
        		dst[i+63:i] := tmp_dst[i+63:i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_unpacklo_ps(__m128 src, __mmask8 k,
                                __m128 a, __m128 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_unpacklo_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_unpacklo_ps(__mmask8 k, __m128 a,
                                 __m128 b)

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]
        }
        tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        FOR j := 0 to 3
        	i := j*32
        	IF k[j]
        		dst[i+31:i] := tmp_dst[i+31:i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fixupimm_round_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_fixupimm_round_sd(__m128d a, __m128d b,
                                  __m128i c, int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_fixupimm_sd
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_fixupimm_sd(__m128d a, __m128d b, __m128i c,
                            int imm8)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_fixupimm_round_sd(__m128d a, __mmask8 k,
                                       __m128d b, __m128i c,
                                       int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        IF k[0]
        	dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_sd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __mmask8 k, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP64 a, 
    MASK k, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_fixupimm_sd(__m128d a, __mmask8 k,
                                 __m128d b, __m128i c,
                                 int imm8)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        IF k[0]
        	dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := a[63:0]
        FI
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_fixupimm_round_sd(__mmask8 k, __m128d a,
                                        __m128d b, __m128i c,
                                        int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        IF k[0]
        	dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_sd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    UI64 c, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_fixupimm_sd(__mmask8 k, __m128d a,
                                  __m128d b, __m128i c,
                                  int imm8)

.. admonition:: Intel Description

    Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
        	tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
        	CASE(tsrc[63:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[63:0] := src1[63:0]
        	1 : dest[63:0] := tsrc[63:0]
        	2 : dest[63:0] := QNaN(tsrc[63:0])
        	3 : dest[63:0] := QNAN_Indefinite
        	4 : dest[63:0] := -INF
        	5 : dest[63:0] := +INF
        	6 : dest[63:0] := tsrc.sign? -INF : +INF
        	7 : dest[63:0] := -0
        	8 : dest[63:0] := +0
        	9 : dest[63:0] := -1
        	10: dest[63:0] := +1
        	11: dest[63:0] := 1/2
        	12: dest[63:0] := 90.0
        	13: dest[63:0] := PI/2
        	14: dest[63:0] := MAX_FLOAT
        	15: dest[63:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[63:0]
        }
        IF k[0]
        	dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := b[127:64]
        dst[MAX:128] := 0
        	

_mm_fixupimm_round_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_fixupimm_round_ss(__m128 a, __m128 b, __m128i c,
                                 int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_fixupimm_ss
^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_fixupimm_ss(__m128 a, __m128 b, __m128i c,
                           int imm8)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_fixupimm_round_ss(__m128 a, __mmask8 k,
                                      __m128 b, __m128i c,
                                      int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        IF k[0]
        	dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_fixupimm_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __mmask8 k, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    FP32 a, 
    MASK k, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_fixupimm_ss(__m128 a, __mmask8 k, __m128 b,
                                __m128i c, int imm8)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        IF k[0]
        	dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := a[31:0]
        FI
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_fixupimm_round_ss(__mmask8 k, __m128 a,
                                       __m128 b, __m128i c,
                                       int imm8, int sae)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
    	[sae_note]

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        IF k[0]
        	dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_fixupimm_ss
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    __m128i c, 
    int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    UI32 c, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_fixupimm_ss(__mmask8 k, __m128 a, __m128 b,
                                 __m128i c, int imm8)

.. admonition:: Intel Description

    Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.

.. admonition:: Community Note [Fix up Notes]

    The phrase 'Fix Up' in this context means to apply your desire method of error detection and correction or flagging. For example, make a number NAN if it fulfils a certain criteria

.. admonition:: See Also [Fix up Notes]

    `A stackoverflow explanation of Fix Up <https://stackoverflow.com/questions/30213615/what-is-meant-by-fixing-up-floats>`_

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        enum TOKEN_TYPE {
        	QNAN_TOKEN := 0, \
        	SNAN_TOKEN := 1, \
        	ZERO_VALUE_TOKEN := 2, \
        	ONE_VALUE_TOKEN := 3, \
        	NEG_INF_TOKEN := 4, \
        	POS_INF_TOKEN := 5, \
        	NEG_VALUE_TOKEN := 6, \
        	POS_VALUE_TOKEN := 7
        }
        DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
        	tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
        	CASE(tsrc[31:0]) OF
        	QNAN_TOKEN:j := 0
        	SNAN_TOKEN:j := 1
        	ZERO_VALUE_TOKEN: j := 2
        	ONE_VALUE_TOKEN: j := 3
        	NEG_INF_TOKEN: j := 4
        	POS_INF_TOKEN: j := 5
        	NEG_VALUE_TOKEN: j := 6
        	POS_VALUE_TOKEN: j := 7
        	ESAC
        	
        	token_response[3:0] := src3[3+4*j:4*j]
        	
        	CASE(token_response[3:0]) OF
        	0 : dest[31:0] := src1[31:0]
        	1 : dest[31:0] := tsrc[31:0]
        	2 : dest[31:0] := QNaN(tsrc[31:0])
        	3 : dest[31:0] := QNAN_Indefinite
        	4 : dest[31:0] := -INF
        	5 : dest[31:0] := +INF
        	6 : dest[31:0] := tsrc.sign? -INF : +INF
        	7 : dest[31:0] := -0
        	8 : dest[31:0] := +0
        	9 : dest[31:0] := -1
        	10: dest[31:0] := +1
        	11: dest[31:0] := 1/2
        	12: dest[31:0] := 90.0
        	13: dest[31:0] := PI/2
        	14: dest[31:0] := MAX_FLOAT
        	15: dest[31:0] := -MAX_FLOAT
        	ESAC
        	
        	CASE(tsrc[31:0]) OF
        	ZERO_VALUE_TOKEN:
        		IF (imm8[0]) #ZE; FI
        	ZERO_VALUE_TOKEN:
        		IF (imm8[1]) #IE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[2]) #ZE; FI
        	ONE_VALUE_TOKEN:
        		IF (imm8[3]) #IE; FI
        	SNAN_TOKEN:
        		IF (imm8[4]) #IE; FI
        	NEG_INF_TOKEN:
        		IF (imm8[5]) #IE; FI
        	NEG_VALUE_TOKEN:
        		IF (imm8[6]) #IE; FI
        	POS_INF_TOKEN:
        		IF (imm8[7]) #IE; FI
        	ESAC
        	RETURN dest[31:0]
        }
        IF k[0]
        	dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := b[127:32]
        dst[MAX:128] := 0
        	

_mm_getexp_round_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_getexp_round_sd(__m128d a, __m128d b, int sae);

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := ConvertExpFP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_getexp_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_getexp_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := ConvertExpFP64(b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_getexp_round_sd(__m128d src, __mmask8 k,
                                     __m128d a, __m128d b,
                                     int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := ConvertExpFP64(b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_getexp_sd(__m128d src, __mmask8 k,
                               __m128d a, __m128d b)

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := ConvertExpFP64(b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_getexp_round_sd(__mmask8 k, __m128d a,
                                      __m128d b, int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := ConvertExpFP64(b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_getexp_sd(__mmask8 k, __m128d a,
                                __m128d b)

.. admonition:: Intel Description

    Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := ConvertExpFP64(b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_getexp_round_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_getexp_round_ss(__m128 a, __m128 b, int sae);

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := ConvertExpFP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_getexp_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_getexp_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := ConvertExpFP32(b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_getexp_round_ss(__m128 src, __mmask8 k,
                                    __m128 a, __m128 b,
                                    int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := ConvertExpFP32(b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_getexp_ss(__m128 src, __mmask8 k, __m128 a,
                              __m128 b)

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := ConvertExpFP32(b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_getexp_round_ss(__mmask8 k, __m128 a,
                                     __m128 b, int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
    	[sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := ConvertExpFP32(b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_getexp_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := ConvertExpFP32(b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_getmant_round_sd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128d _mm_getmant_round_sd(__m128d a, __m128d b,
                                 _MM_MANTISSA_NORM_ENUM interv,
                                 _MM_MANTISSA_SIGN_ENUM sc,
                                 int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_getmant_sd
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_getmant_sd(__m128d a, __m128d b,
                           _MM_MANTISSA_NORM_ENUM interv,
                           _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_getmant_round_sd(
        __m128d src, __mmask8 k, __m128d a, __m128d b,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_mask_getmant_sd(__m128d src, __mmask8 k,
                                __m128d a, __m128d b,
                                _MM_MANTISSA_NORM_ENUM interv,
                                _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_getmant_round_sd(
        __mmask8 k, __m128d a, __m128d b,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_sd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128d _mm_maskz_getmant_sd(__mmask8 k, __m128d a,
                                 __m128d b,
                                 _MM_MANTISSA_NORM_ENUM interv,
                                 _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_getmant_round_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128 _mm_getmant_round_ss(__m128 a, __m128 b,
                                _MM_MANTISSA_NORM_ENUM interv,
                                _MM_MANTISSA_SIGN_ENUM sc,
                                int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_getmant_ss
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_getmant_ss(__m128 a, __m128 b,
                          _MM_MANTISSA_NORM_ENUM interv,
                          _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_getmant_round_ss(
        __m128 src, __mmask8 k, __m128 a, __m128 b,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_mask_getmant_ss(__m128 src, __mmask8 k, __m128 a,
                               __m128 b,
                               _MM_MANTISSA_NORM_ENUM interv,
                               _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc, 
    int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_getmant_round_ss(
        __mmask8 k, __m128 a, __m128 b,
        _MM_MANTISSA_NORM_ENUM interv,
        _MM_MANTISSA_SIGN_ENUM sc, int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_ss
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    _MM_MANTISSA_NORM_ENUM interv, 
    _MM_MANTISSA_SIGN_ENUM sc
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM interv, 
    IMM sc

.. code-block:: C

    __m128 _mm_maskz_getmant_ss(__mmask8 k, __m128 a, __m128 b,
                                _MM_MANTISSA_NORM_ENUM interv,
                                _MM_MANTISSA_SIGN_ENUM sc)

.. admonition:: Intel Description

    Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_mask_roundscale_round_sd(__m128d src,
                                         __mmask8 k, __m128d a,
                                         __m128d b,
                                         const int imm8,
                                         const int sae)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_sd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_mask_roundscale_sd(__m128d src, __mmask8 k,
                                   __m128d a, __m128d b,
                                   const int imm8)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_maskz_roundscale_round_sd(__mmask8 k, __m128d a,
                                          __m128d b,
                                          const int imm8,
                                          const int sae)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_maskz_roundscale_sd(__mmask8 k, __m128d a,
                                    __m128d b, const int imm8)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        IF k[0]
        	dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_roundscale_round_sd
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128d _mm_roundscale_round_sd(__m128d a, __m128d b,
                                    const int imm8,
                                    const int sae)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_roundscale_sd
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_roundscale_sd(__m128d a, __m128d b,
                              const int imm8)

.. admonition:: Intel Description

    Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
        	m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
        	IF IsInf(tmp[63:0])
        		tmp[63:0] := src1[63:0]
        	FI
        	RETURN tmp[63:0]
        }
        dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_mask_roundscale_round_ss(__m128 src, __mmask8 k,
                                        __m128 a, __m128 b,
                                        const int imm8,
                                        const int sae)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_roundscale_ss
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_mask_roundscale_ss(__m128 src, __mmask8 k,
                                  __m128 a, __m128 b,
                                  const int imm8)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_maskz_roundscale_round_ss(__mmask8 k, __m128 a,
                                         __m128 b,
                                         const int imm8,
                                         const int sae)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_roundscale_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_maskz_roundscale_ss(__mmask8 k, __m128 a,
                                   __m128 b, const int imm8)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        IF k[0]
        	dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_roundscale_round_ss
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8, 
    const int sae
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128 _mm_roundscale_round_ss(__m128 a, __m128 b,
                                   const int imm8,
                                   const int sae)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_roundscale_ss
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_roundscale_ss(__m128 a, __m128 b,
                             const int imm8)

.. admonition:: Intel Description

    Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
        	m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
        	IF IsInf(tmp[31:0])
        		tmp[31:0] := src1[31:0]
        	FI
        	RETURN tmp[31:0]
        }
        dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_mask_scalef_round_sd(__m128d src, __mmask8 k,
                                     __m128d a, __m128d b,
                                     int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[63:0] := SCALE(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_sd
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 src, 
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_mask_scalef_sd(__m128d src, __mmask8 k,
                               __m128d a, __m128d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[63:0] := SCALE(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := src[63:0]
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_round_sd
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_maskz_scalef_round_sd(__mmask8 k, __m128d a,
                                      __m128d b, int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[63:0] := SCALE(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __mmask8 k, 
    __m128d a, 
    __m128d b
:Param ETypes:
    MASK k, 
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_maskz_scalef_sd(__mmask8 k, __m128d a,
                                __m128d b)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[63:0] := SCALE(a[63:0], b[63:0])
        ELSE
        	dst[63:0] := 0
        FI
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_scalef_round_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    int rounding
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM rounding

.. code-block:: C

    __m128d _mm_scalef_round_sd(__m128d a, __m128d b,
                                int rounding)

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        dst[63:0] := SCALE(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_scalef_sd
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m128d _mm_scalef_sd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
        	RETURN dst[63:0]
        }
        dst[63:0] := SCALE(a[63:0], b[63:0])
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_mask_scalef_round_ss(__m128 src, __mmask8 k,
                                    __m128 a, __m128 b,
                                    int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[31:0] := SCALE(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_ss
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 src, 
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_mask_scalef_ss(__m128 src, __mmask8 k, __m128 a,
                              __m128 b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[31:0] := SCALE(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := src[31:0]
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_round_ss
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_maskz_scalef_round_ss(__mmask8 k, __m128 a,
                                     __m128 b, int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[31:0] := SCALE(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __mmask8 k, 
    __m128 a, 
    __m128 b
:Param ETypes:
    MASK k, 
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_maskz_scalef_ss(__mmask8 k, __m128 a, __m128 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        IF k[0]
        	dst[31:0] := SCALE(a[31:0], b[31:0])
        ELSE
        	dst[31:0] := 0
        FI
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_scalef_round_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    int rounding
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM rounding

.. code-block:: C

    __m128 _mm_scalef_round_ss(__m128 a, __m128 b,
                               int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        dst[31:0] := SCALE(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_scalef_ss
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m128 _mm_scalef_ss(__m128 a, __m128 b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE SCALE(src1, src2) {
        	IF (src2 == NaN)
        		IF (src2 == SNaN)
        			RETURN QNAN(src2)
        		FI
        	ELSE IF (src1 == NaN)
        		IF (src1 == SNaN)
        			RETURN QNAN(src1)
        		FI
        		IF (src2 != INF)
        			RETURN QNAN(src1)
        		FI
        	ELSE
        		tmp_src2 := src2
        		tmp_src1 := src1
        		IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
        			tmp_src2 := 0
        		FI
        		IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
        			tmp_src1 := 0
        		FI
        	FI
        	dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
        	RETURN dst[63:0]
        }
        dst[31:0] := SCALE(a[31:0], b[31:0])
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_roundscale_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_roundscale_ph(__m128h a, int imm8);

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 7
        	dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        ENDFOR
        dest[MAX:128] := 0
        	

_mm_mask_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_mask_roundscale_ph(__m128h src, __mmask8 k,
                                   __m128h a, int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dest[MAX:128] := 0
        	

_mm_maskz_roundscale_ph
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_maskz_roundscale_ph(__mmask8 k, __m128h a,
                                    int imm8)

.. admonition:: Intel Description

    Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dest[MAX:128] := 0
        	

_mm_getexp_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128h _mm_getexp_ph(__m128h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getexp_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_mask_getexp_ph(__m128h src, __mmask8 k,
                               __m128h a)

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a
:Param ETypes:
    MASK k, 
    FP16 a

.. code-block:: C

    __m128h _mm_maskz_getexp_ph(__mmask8 k, __m128h a);

.. admonition:: Intel Description

    Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ConvertExpFP16(a.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_getmant_ph
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_getmant_ph(__m128h a,
                           _MM_MANTISSA_NORM_ENUM norm,
                           _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 7
        	dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_getmant_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_mask_getmant_ph(__m128h src, __mmask8 k,
                                __m128h a,
                                _MM_MANTISSA_NORM_ENUM norm,
                                _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 7
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_ph
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_maskz_getmant_ph(__mmask8 k, __m128h a,
                                 _MM_MANTISSA_NORM_ENUM norm,
                                 _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 TO 7
        	IF k[i]
        		dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_reduce_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_reduce_ph(__m128h a, int imm8);

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 7
        	dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_reduce_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_mask_reduce_ph(__m128h src, __mmask8 k,
                               __m128h a, int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_reduce_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __m128h _mm_maskz_reduce_ph(__mmask8 k, __m128h a,
                                int imm8)

.. admonition:: Intel Description

    Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
        	m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
        	tmp[15:0] := src[15:0] - tmp[15:0]
        	IF IsInf(tmp[15:0])
        		tmp[15:0] := FP16(0.0)
        	FI
        	RETURN tmp[15:0]
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_scalef_ph
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_scalef_ph(__m128h a, __m128h b);

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 7
        	dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_scalef_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_scalef_ph(__m128h src, __mmask8 k,
                               __m128h a, __m128h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := src.fp16[i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_scalef_ph(__mmask8 k, __m128h a,
                                __m128h b)

.. admonition:: Intel Description

    Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        FOR i := 0 to 7
        	IF k[i]
        		dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
        	ELSE
        		dst.fp16[i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_ph_mask(__m128h a, int imm8);

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
    		[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        ENDFOR
        k[MAX:8] := 0
        	

_mm_mask_fpclass_ph_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128h a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_ph_mask(__mmask8 k1, __m128h a,
                                      int imm8)

.. admonition:: Intel Description

    Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
    		[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        FOR i := 0 to 7
        	IF k1[i]
        		k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
        	ELSE
        		k[i] := 0
        	FI
        ENDFOR
        k[MAX:8] := 0
        	

_mm_permutex2var_ph
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128i idx, 
    __m128h b
:Param ETypes:
    FP16 a, 
    UI16 idx, 
    FP16 b

.. code-block:: C

    __m128h _mm_permutex2var_ph(__m128h a, __m128i idx,
                                __m128h b)

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	off := idx[i+2:i]
        	dst.fp16[j] := idx[i+3] ? b.fp16[off] : a.fp16[off]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_blend_ph
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_blend_ph(__mmask8 k, __m128h a, __m128h b);

.. admonition:: Intel Description

    Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	IF k[j]
        		dst.fp16[j] := b.fp16[j]
        	ELSE
        		dst.fp16[j] := a.fp16[j]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_permutexvar_ph
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128i idx, 
    __m128h a
:Param ETypes:
    UI16 idx, 
    FP16 a

.. code-block:: C

    __m128h _mm_permutexvar_ph(__m128i idx, __m128h a);

.. admonition:: Intel Description

    Shuffle half-precision (16-bit) floating-point elements in "a" using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	id := idx[i+2:i]
        	dst.fp16[j] := a.fp16[id]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_roundscale_sh
^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_roundscale_sh(__m128h a, __m128h b, int imm8);

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_roundscale_round_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_roundscale_round_sh(__m128h a, __m128h b,
                                    int imm8, const int sae)

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_mask_roundscale_sh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_mask_roundscale_sh(__m128h src, __mmask8 k,
                                   __m128h a, __m128h b,
                                   int imm8)

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        IF k[0]
        	dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_mask_roundscale_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_roundscale_round_sh(__m128h src,
                                         __mmask8 k, __m128h a,
                                         __m128h b, int imm8,
                                         const int sae)

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        IF k[0]
        	dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_maskz_roundscale_sh
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8

.. code-block:: C

    __m128h _mm_maskz_roundscale_sh(__mmask8 k, __m128h a,
                                    __m128h b, int imm8)

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        IF k[0]
        	dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_maskz_roundscale_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    int imm8, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM imm8, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_roundscale_round_sh(__mmask8 k, __m128h a,
                                          __m128h b, int imm8,
                                          const int sae)

.. admonition:: Intel Description

    Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
        	m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
        	tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
        	RETURN tmp.fp16
        }
        IF k[0]
        	dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dest[MAX:128] := 0
        	

_mm_getexp_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_getexp_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_getexp_round_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_getexp_round_sh(__m128h a, __m128h b,
                                const int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_getexp_sh(__m128h src, __mmask8 k,
                               __m128h a, __m128h b)

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_getexp_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_getexp_round_sh(__m128h src, __mmask8 k,
                                     __m128h a, __m128h b,
                                     const int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_getexp_sh(__mmask8 k, __m128h a,
                                __m128h b)

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_getexp_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_getexp_round_sh(__mmask8 k, __m128h a,
                                      __m128h b, const int sae)

.. admonition:: Intel Description

    Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := ConvertExpFP16(b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_getmant_sh
^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_getmant_sh(__m128h a, __m128h b,
                           _MM_MANTISSA_NORM_ENUM norm,
                           _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_getmant_round_sh
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m128h _mm_getmant_round_sh(__m128h a, __m128h b,
                                 _MM_MANTISSA_NORM_ENUM norm,
                                 _MM_MANTISSA_SIGN_ENUM sign,
                                 const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_mask_getmant_sh(__m128h src, __mmask8 k,
                                __m128h a, __m128h b,
                                _MM_MANTISSA_NORM_ENUM norm,
                                _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_getmant_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m128h _mm_mask_getmant_round_sh(
        __m128h src, __mmask8 k, __m128h a, __m128h b,
        _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign, const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_sh
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign

.. code-block:: C

    __m128h _mm_maskz_getmant_sh(__mmask8 k, __m128h a,
                                 __m128h b,
                                 _MM_MANTISSA_NORM_ENUM norm,
                                 _MM_MANTISSA_SIGN_ENUM sign)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_getmant_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    _MM_MANTISSA_NORM_ENUM norm, 
    _MM_MANTISSA_SIGN_ENUM sign, 
    const int sae
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM norm, 
    IMM sign, 
    IMM sae

.. code-block:: C

    __m128h _mm_maskz_getmant_round_sh(
        __mmask8 k, __m128h a, __m128h b,
        _MM_MANTISSA_NORM_ENUM norm,
        _MM_MANTISSA_SIGN_ENUM sign, const int sae)

.. admonition:: Intel Description

    Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
    	[getmant_note][sae_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k[0]
        	dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_scalef_sh
^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_scalef_sh(__m128h a, __m128h b);

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_scalef_round_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_scalef_round_sh(__m128h a, __m128h b,
                                const int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_sh
^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_mask_scalef_sh(__m128h src, __mmask8 k,
                               __m128h a, __m128h b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        IF k[0]
        	dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_mask_scalef_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __m128h src, 
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    FP16 src, 
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_mask_scalef_round_sh(__m128h src, __mmask8 k,
                                     __m128h a, __m128h b,
                                     const int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        IF k[0]
        	dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        ELSE
        	dst.fp16[0] := src.fp16[0]
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_sh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b

.. code-block:: C

    __m128h _mm_maskz_scalef_sh(__mmask8 k, __m128h a,
                                __m128h b)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        IF k[0]
        	dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_maskz_scalef_round_sh
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __m128h
:Param Types:
    __mmask8 k, 
    __m128h a, 
    __m128h b, 
    const int rounding
:Param ETypes:
    MASK k, 
    FP16 a, 
    FP16 b, 
    IMM rounding

.. code-block:: C

    __m128h _mm_maskz_scalef_round_sh(__mmask8 k, __m128h a,
                                      __m128h b,
                                      const int rounding)

.. admonition:: Intel Description

    Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
    		[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        DEFINE ScaleFP16(src1, src2) {
        	denormal1 := (a.exp == 0) and (a.fraction != 0)
        	denormal2 := (b.exp == 0) and (b.fraction != 0)
        	tmp1 := src1
        	tmp2 := src2
        	IF MXCSR.DAZ
        		IF denormal1
        			tmp1 := 0
        		FI
        		IF denormal2
        			tmp2 := 0
        		FI
        	FI
        	RETURN tmp1 * POW(2.0, FLOOR(tmp2))
        }
        IF k[0]
        	dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
        ELSE
        	dst.fp16[0] := 0
        FI
        dst[127:16] := a[127:16]
        dst[MAX:128] := 0
        	

_mm_fpclass_sh_mask
^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __m128h a, 
    int imm8
:Param ETypes:
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_fpclass_sh_mask(__m128h a, int imm8);

.. admonition:: Intel Description

    Test the lower half-precision (16-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k".
    			[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        k[0] := CheckFPClass_FP16(a.fp16[0], imm8[7:0])
        k[MAX:1] := 0
        	

_mm_mask_fpclass_sh_mask
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX-512
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX-512-Miscellaneous-XMM
:Register: XMM 128 bit
:Return Type: __mmask8
:Param Types:
    __mmask8 k1, 
    __m128h a, 
    int imm8
:Param ETypes:
    MASK k1, 
    FP16 a, 
    IMM imm8

.. code-block:: C

    __mmask8 _mm_mask_fpclass_sh_mask(__mmask8 k1, __m128h a,
                                      int imm8)

.. admonition:: Intel Description

    Test the lower half-precision (16-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
    		[fpclass_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        IF k1[0]
        	k[0] := CheckFPClass_FP16(a.fp16[0], imm8[7:0])
        ELSE
        	k[0] := 0
        FI
        k[MAX:1] := 0
        	

AVX_ALL
=======
Shift
-----
YMM
~~~
_mm256_slli_si256
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_slli_si256(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] << (tmp*8)
        dst[255:128] := a[255:128] << (tmp*8)
        dst[MAX:256] := 0
        	

_mm256_bslli_epi128
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_bslli_epi128(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] << (tmp*8)
        dst[255:128] := a[255:128] << (tmp*8)
        dst[MAX:256] := 0
        	

_mm256_sll_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_sll_epi16(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_slli_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_slli_epi16(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sll_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_sll_epi32(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_slli_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_slli_epi32(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sll_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_sll_epi64(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_slli_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_slli_epi64(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sllv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_sllv_epi32(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sllv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_sllv_epi64(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sra_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_sra_epi16(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srai_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    SI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srai_epi16(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
        	ELSE
        		dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sra_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_sra_epi32(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srai_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    SI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srai_epi32(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
        	ELSE
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srav_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    SI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_srav_epi32(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srli_si256
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srli_si256(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] >> (tmp*8)
        dst[255:128] := a[255:128] >> (tmp*8)
        dst[MAX:256] := 0
        	

_mm256_bsrli_epi128
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_bsrli_epi128(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp := imm8[7:0]
        IF tmp > 15
        	tmp := 16
        FI
        dst[127:0] := a[127:0] >> (tmp*8)
        dst[255:128] := a[255:128] >> (tmp*8)
        dst[MAX:256] := 0
        	

_mm256_srl_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI16 a, 
    UI16 count

.. code-block:: C

    __m256i _mm256_srl_epi16(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF count[63:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srli_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srli_epi16(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF imm8[7:0] > 15
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srl_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_srl_epi32(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[63:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srli_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srli_epi32(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF imm8[7:0] > 31
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srl_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_srl_epi64(__m256i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[63:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srli_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_srli_epi64(__m256i a, int imm8);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF imm8[7:0] > 63
        		dst[i+63:i] := 0
        	ELSE
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srlv_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m256i _mm256_srlv_epi32(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_srlv_epi64
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m256i _mm256_srlv_epi64(__m256i a, __m256i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_sllv_epi32
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_sllv_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_sllv_epi64
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_sllv_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srav_epi32
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    SI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_srav_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srlv_epi32
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI32 a, 
    UI32 count

.. code-block:: C

    __m128i _mm_srlv_epi32(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF count[i+31:i] < 32
        		dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_srlv_epi64
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Shift
:Header: immintrin.h
:Searchable: AVX_ALL-Shift-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i count
:Param ETypes:
    UI64 a, 
    UI64 count

.. code-block:: C

    __m128i _mm_srlv_epi64(__m128i a, __m128i count);

.. admonition:: Intel Description

    Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF count[i+63:i] < 64
        		dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

Cryptography
------------
YMM
~~~
_mm256_sha512msg1_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __A, 
    __m128i __B
:Param ETypes:
    UI64 __A, 
    UI64 __B

.. code-block:: C

    __m256i _mm256_sha512msg1_epi64(__m256i __A, __m128i __B);

.. admonition:: Intel Description

    This intrinisc is one of the two SHA512 message scheduling instructions. The intrinsic performs an intermediate calculation for the next four SHA512 message qwords. The calculated results are stored in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROR64(qword, n) {
        	count := n % 64
        	dest := (qword >> count) | (qword << (64 - count))
        	RETURN dest
        }
        DEFINE SHR64(qword, n) {
        	RETURN qword >> n
        }
        DEFINE s0(qword) {
        	RETURN ROR64(qword,1) ^ ROR64(qword, 8) ^ SHR64(qword, 7)
        }
        W.qword[4] := __B.qword[0]
        W.qword[3] := __A.qword[3]
        W.qword[2] := __A.qword[2]
        W.qword[1] := __A.qword[1]
        W.qword[0] := __A.qword[0]
        dst.qword[3] := W.qword[3] + s0(W.qword[4])
        dst.qword[2] := W.qword[2] + s0(W.qword[3])
        dst.qword[1] := W.qword[1] + s0(W.qword[2])
        dst.qword[0] := W.qword[0] + s0(W.qword[1])
        

_mm256_sha512msg2_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __A, 
    __m256i __B
:Param ETypes:
    UI64 __A, 
    UI64 __B

.. code-block:: C

    __m256i _mm256_sha512msg2_epi64(__m256i __A, __m256i __B);

.. admonition:: Intel Description

    This intrinisc is one of the two SHA512 message scheduling instructions. The intrinsic performs the final calculation for the next four SHA512 message qwords. The calculated results are stored in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROR64(qword, n) {
        	count := n % 64
        	dest := (qword >> count) | (qword << (64 - count))
        	RETURN dest
        }
        DEFINE SHR64(qword, n) {
        	RETURN qword >> n
        }
        DEFINE s1(qword) {
        	RETURN ROR64(qword,19) ^ ROR64(qword, 61) ^ SHR64(qword, 6)
        }
        W.qword[14] := __B.qword[2]
        W.qword[15] := __B.qword[3]
        W.qword[16] := __A.qword[0] + s1(W.qword[14])
        W.qword[17] := __A.qword[1] + s1(W.qword[15])
        W.qword[18] := __A.qword[2] + s1(W.qword[16])
        W.qword[19] := __A.qword[3] + s1(W.qword[17])
        dst.qword[3] := W.qword[19]
        dst.qword[2] := W.qword[18]
        dst.qword[1] := W.qword[17]
        dst.qword[0] := W.qword[16]
        

_mm256_sha512rnds2_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __A, 
    __m256i __B, 
    __m128i __C
:Param ETypes:
    UI64 __A, 
    UI64 __B, 
    UI64 __C

.. code-block:: C

    __m256i _mm256_sha512rnds2_epi64(__m256i __A, __m256i __B,
                                     __m128i __C)

.. admonition:: Intel Description

    This intrinisc performs two rounds of SHA512 operation using initial SHA512 state (C,D,G,H) from "__A", an initial SHA512 state (A,B,E,F) from "__B", and a pre-computed sum of the next two round message qwords and the corresponding round constants from "__C" (only the two lower qwords of the third operand). The updated SHA512 state (A,B,E,F) is written to "dst", and "dst" can be used as the updated state (C,D,G,H) in later rounds.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROR64(qword, n) {
        	count := n % 64
        	dest := (qword >> count) | (qword << (64 - count))
        	RETURN dest
        }
        DEFINE SHR64(qword, n) {
        	RETURN qword >> n
        }
        DEFINE cap_sigma0(qword) {
        	RETURN ROR64(qword, 28) ^ ROR64(qword, 34) ^ ROR64(qword, 39)
        }
        DEFINE cap_sigma1(qword) {
        	RETURN ROR64(qword, 14) ^ ROR64(qword, 18) ^ ROR64(qword, 41)
        }
        DEFINE MAJ(a,b,c) {
        	RETURN (a & b) ^ (a & c) ^ (b & c)
        }
        DEFINE CH(a,b,c) {
        	RETURN (a & b) ^ (c & ~a)
        }
        A.qword[0] := __B.qword[3]
        B.qword[0] := __B.qword[2]
        C.qword[0] := __A.qword[3]
        D.qword[0] := __A.qword[2]
        E.qword[0] := __B.qword[1]
        F.qword[0] := __B.qword[0]
        G.qword[0] := __A.qword[1]
        H.qword[0] := __A.qword[0]
        WK.qword[0]:= __C.qword[0]
        WK.qword[1]:= __C.qword[1]
        FOR i := 0 to 1
        	A.qword[i+1] := CH(E.qword[i], F.qword[i], G.qword[i]) + cap_sigma1(E.qword[i]) + WK.qword[i] + H.qword[i] + MAJ(A.qword[i], B.qword[i], C.qword[i]) + cap_sigma0(A.qword[i])
        	B.qword[i+1] := A.qword[i]
        	C.qword[i+1] := B.qword[i]
        	D.qword[i+1] := C.qword[i]
        	E.qword[i+1] := CH(E.qword[i], F.qword[i], G.qword[i]) + cap_sigma1(E.qword[i]) + WK.qword[i] + H.qword[i] + D.qword[i]
        	F.qword[i+1] := E.qword[i]
        	G.qword[i+1] := F.qword[i]
        	H.qword[i+1] := G.qword[i]
        ENDFOR
        dst.qword[3] := A.qword[2]
        dst.qword[2] := B.qword[2]
        dst.qword[1] := E.qword[2]
        dst.qword[0] := F.qword[2]
        

_mm256_sm4key4_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __A, 
    __m256i __B
:Param ETypes:
    UI32 __A, 
    UI32 __B

.. code-block:: C

    __m256i _mm256_sm4key4_epi32(__m256i __A, __m256i __B);

.. admonition:: Intel Description

    This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in "dst". 

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        BYTE sbox[256] = {
        0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
        0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
        0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
        0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
        0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
        0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
        0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
        0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
        0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
        0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
        0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
        0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
        0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
        0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
        0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
        0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
        }
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        DEFINE SBOX_BYTE(dword, i) {
        	RETURN sbox[dword.byte[i]]
        }
        DEFINE lower_t(dword) {
        	tmp.byte[0] := SBOX_BYTE(dword, 0)
        	tmp.byte[1] := SBOX_BYTE(dword, 1)
        	tmp.byte[2] := SBOX_BYTE(dword, 2)
        	tmp.byte[3] := SBOX_BYTE(dword, 3)
        	RETURN tmp
        }
        DEFINE L_KEY(dword) {
        	RETURN dword ^ ROL32(dword, 13) ^ ROL32(dword, 23)
        }
        DEFINE T_KEY(dword) {
        	RETURN L_KEY(lower_t(dword))
        }
        DEFINE F_KEY(X0, X1, X2, X3, round_key) {
        	RETURN X0 ^ T_KEY(X1 ^ X2 ^ X3 ^ round_key)
        }
        FOR i:= 0 to 1
        	P.dword[0] := __A.dword[4*i]
        	P.dword[1] := __A.dword[4*i+1]
        	P.dword[2] := __A.dword[4*i+2]
        	P.dword[3] := __A.dword[4*i+3]
        	C.dword[0] := F_KEY(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[4*i])
        	C.dword[1] := F_KEY(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[4*i+1])
        	C.dword[2] := F_KEY(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[4*i+2])
        	C.dword[3] := F_KEY(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[4*i+3])
        	dst.dword[4*i] := C.dword[0]
        	dst.dword[4*i+1] := C.dword[1]
        	dst.dword[4*i+2] := C.dword[2]
        	dst.dword[4*i+3] := C.dword[3]
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_sm4rnds4_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __A, 
    __m256i __B
:Param ETypes:
    UI32 __A, 
    UI32 __B

.. code-block:: C

    __m256i _mm256_sm4rnds4_epi32(__m256i __A, __m256i __B);

.. admonition:: Intel Description

    This intrinisc performs four rounds of SM4 encryption. The intrinisc operates on independent 128-bit lanes. The calculated results are stored in "dst". 

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        BYTE sbox[256] = {
        0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
        0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
        0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
        0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
        0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
        0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
        0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
        0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
        0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
        0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
        0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
        0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
        0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
        0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
        0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
        0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
        }
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        DEFINE SBOX_BYTE(dword, i) {
        	RETURN sbox[dword.byte[i]]
        }
        DEFINE lower_t(dword) {
        	tmp.byte[0] := SBOX_BYTE(dword, 0)
        	tmp.byte[1] := SBOX_BYTE(dword, 1)
        	tmp.byte[2] := SBOX_BYTE(dword, 2)
        	tmp.byte[3] := SBOX_BYTE(dword, 3)
        	RETURN tmp
        }
        DEFINE L_RND(dword) {
        	tmp := dword
        	tmp := tmp ^ ROL32(dword, 2)
        	tmp := tmp ^ ROL32(dword, 10)
        	tmp := tmp ^ ROL32(dword, 18)
        	tmp := tmp ^ ROL32(dword, 24)
        	RETURN tmp
        }
        DEFINE T_RND(dword) {
        	RETURN L_RND(lower_t(dword))
        }
        DEFINE F_RND(X0, X1, X2, X3, round_key) {
        	RETURN X0 ^ T_RND(X1 ^ X2 ^ X3 ^ round_key)
        }
        FOR i:= 0 to 1
        	P.dword[0] := __A.dword[4*i]
        	P.dword[1] := __A.dword[4*i+1]
        	P.dword[2] := __A.dword[4*i+2]
        	P.dword[3] := __A.dword[4*i+3]
        	C.dword[0] := F_RND(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[4*i])
        	C.dword[1] := F_RND(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[4*i+1])
        	C.dword[2] := F_RND(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[4*i+2])
        	C.dword[3] := F_RND(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[4*i+3])
        	dst.dword[4*i] := C.dword[0]
        	dst.dword[4*i+1] := C.dword[1]
        	dst.dword[4*i+2] := C.dword[2]
        	dst.dword[4*i+3] := C.dword[3]
        ENDFOR
        dst[MAX:256] := 0
        

XMM
~~~
_mm_sm3msg1_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __A, 
    __m128i __B, 
    __m128i __C
:Param ETypes:
    UI32 __A, 
    UI32 __B, 
    UI32 __C

.. code-block:: C

    __m128i _mm_sm3msg1_epi32(__m128i __A, __m128i __B,
                              __m128i __C)

.. admonition:: Intel Description

    The VSM3MSG1 intrinsic is one of the two SM3 message scheduling intrinsics. The intrinsic performs an initial calculation for the next four SM3 message words. The calculated results are stored in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32 - count))
        	RETURN dest
        }
        DEFINE P1(x) {
        	RETURN x ^ ROL32(x, 15) ^ ROL32(x, 23)
        }
        W.dword[0] := __C.dword[0]
        W.dword[1] := __C.dword[1]
        W.dword[2] := __C.dword[2]
        W.dword[3] := __C.dword[3]
        W.dword[7] := __A.dword[0]
        W.dword[8] := __A.dword[1]
        W.dword[9] := __A.dword[2]
        W.dword[10] := __A.dword[3]
        W.dword[13] := __B.dword[0]
        W.dword[14] := __B.dword[1]
        W.dword[15] := __B.dword[2]
        TMP0 := W.dword[7] ^ W.dword[0] ^ ROL32(W.dword[13], 15)
        TMP1 := W.dword[8] ^ W.dword[1] ^ ROL32(W.dword[14], 15)
        TMP2 := W.dword[9] ^ W.dword[2] ^ ROL32(W.dword[15], 15)
        TMP3 := W.dword[10] ^ W.dword[3]
        dst.dword[0] := P1(TMP0)
        dst.dword[1] := P1(TMP1)
        dst.dword[2] := P1(TMP2)
        dst.dword[3] := P1(TMP3)
        

_mm_sm3msg2_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __A, 
    __m128i __B, 
    __m128i __C
:Param ETypes:
    UI32 __A, 
    UI32 __B, 
    UI32 __C

.. code-block:: C

    __m128i _mm_sm3msg2_epi32(__m128i __A, __m128i __B,
                              __m128i __C)

.. admonition:: Intel Description

    The VSM3MSG2 intrinsic is one of the two SM3 message scheduling intrinsics. The intrinsic performs the final calculation for the next four SM3 message words. The calculated results are stored in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        WTMP.dword[0] := __A.dword[0]
        WTMP.dword[1] := __A.dword[1]
        WTMP.dword[2] := __A.dword[2]
        WTMP.dword[3] := __A.dword[3]
        W.dword[3] := __B.dword[0]
        W.dword[4] := __B.dword[1]
        W.dword[5] := __B.dword[2]
        W.dword[6] := __B.dword[3]
        W.dword[10] := __C.dword[0]
        W.dword[11] := __C.dword[1]
        W.dword[12] := __C.dword[2]
        W.dword[13] := __C.dword[3]
        W.dword[16] := ROL32(W.dword[3], 7) ^ W.dword[10] ^ WTMP.dword[0]
        W.dword[17] := ROL32(W.dword[4], 7) ^ W.dword[11] ^ WTMP.dword[1]
        W.dword[18] := ROL32(W.dword[5], 7) ^ W.dword[12] ^ WTMP.dword[2]
        W.dword[19] := ROL32(W.dword[6], 7) ^ W.dword[13] ^ WTMP.dword[3]
        W.dword[19] := W.dword[19] ^ ROL32(W.dword[16], 6) ^ ROL32(W.dword[16], 15) ^ ROL32(W.dword[16], 30)
        dst.dword[0] := W.dword[16]
        dst.dword[1] := W.dword[17]
        dst.dword[2] := W.dword[18]
        dst.dword[3] := W.dword[19]
        

_mm_sm3rnds2_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __A, 
    __m128i __B, 
    __m128i __C, 
    const int imm8
:Param ETypes:
    UI32 __A, 
    UI32 __B, 
    UI32 __C, 
    IMM imm8

.. code-block:: C

    __m128i _mm_sm3rnds2_epi32(__m128i __A, __m128i __B,
                               __m128i __C, const int imm8)

.. admonition:: Intel Description

    The intrinsic performs two rounds of SM3 operation using initial SM3 state (C, D, G, H) from "__A", an initial SM3 states (A, B, E, F) from "__B" and a pre-computed words from the "__C". "__A" with initial SM3 state of (C, D, G, H) assumes input of non-rotated left variables from previous state. The updated SM3 state (A, B, E, F) is written to "__A". The "imm8" should contain the even round number for the first of the two rounds computed by this instruction. The computation masks the "imm8" value by ANDing it with 0x3E so that only even round numbers from 0 through 62 are used for this operation. The calculated results are stored in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        DEFINE P0(x) {
        	RETURN x ^ ROL32(x, 9) ^ ROL32(x, 17)
        }
        DEFINE FF(x, y, z, round) {
        	IF round < 16
        		RETURN (x ^ y ^ z)
        	ELSE
        		RETURN (x & y) | (x & z) | (y & z)
        	FI
        }
        DEFINE GG(x, y, z, round){
        	IF round < 16
        		RETURN (x ^ y ^ z)
        	ELSE
        		RETURN (x & y) | (~x & z)
        	FI
        }
        A.dword[0] := __B.dword[3]
        B.dword[0] := __B.dword[2]
        C.dword[0] := __A.dword[3]
        D.dword[0] := __A.dword[2]
        E.dword[0] := __B.dword[1]
        F.dword[0] := __B.dword[0]
        G.dword[0] := __A.dword[1]
        H.dword[0] := __A.dword[0]
        W.dword[0] := __C.dword[0]
        W.dword[1] := __C.dword[1]
        W.dword[4] := __C.dword[2]
        W.dword[5] := __C.dword[3]
        C.dword[0] := ROL32(C.dword[0], 9)
        D.dword[0] := ROL32(D.dword[0], 9)
        G.dword[0] := ROL32(G.dword[0], 19)
        H.dword[0] := ROL32(H.dword[0], 19)
        ROUND := imm8 & 0x3E
        IF ROUND < 16
        	CONST.dword[0] := 0x79CC4519
        ELSE
        	CONST.dword[0] := 0x7A879D8A
        FI
        CONST.dword[0] := ROL32(CONST.dword[0], ROUND)
        FOR i:= 0 to 1
        	temp.dword[0] := ROL32(A.dword[i], 12) + E.dword[i] + CONST.dword[0]
        	S1.dword[0] := ROL32(temp.dword[0], 7)
        	S2.dword[0] := S1.dword[0] ^ ROL32(A.dword[i], 12)
        	T1.dword[0] := FF(A.dword[i], B.dword[i], C.dword[i], ROUND) + D.dword[i] + S2.dword[0] + (W.dword[i] ^ W.dword[i+4])
        	T2.dword[0] := GG(E.dword[i], F.dword[i], G.dword[i], ROUND) + H.dword[i] + S1.dword[0] + W.dword[i]
        	D.dword[i+1] := C.dword[i]
        	C.dword[i+1] := ROL32(B.dword[i], 9)
        	B.dword[i+1] := A.dword[i]
        	A.dword[i+1] := T1.dword[0]
        	H.dword[i+1] := G.dword[i]
        	G.dword[i+1] := ROL32(F.dword[i], 19)
        	F.dword[i+1] := E.dword[i]
        	E.dword[i+1] := P0(T2.dword[0])
        	CONST.dword[0] := ROL32(CONST.dword[0], 1)
        ENDFOR
        dst.dword[3] := A.dword[2]
        dst.dword[2] := B.dword[2]
        dst.dword[1] := E.dword[2]
        dst.dword[0] := F.dword[2]
        

_mm_sm4key4_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __A, 
    __m128i __B
:Param ETypes:
    UI32 __A, 
    UI32 __B

.. code-block:: C

    __m128i _mm_sm4key4_epi32(__m128i __A, __m128i __B);

.. admonition:: Intel Description

    This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in "dst". 

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        BYTE sbox[256] = {
        0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
        0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
        0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
        0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
        0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
        0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
        0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
        0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
        0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
        0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
        0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
        0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
        0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
        0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
        0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
        0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
        }
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        DEFINE SBOX_BYTE(dword, i) {
        	RETURN sbox[dword.byte[i]]
        }
        DEFINE lower_t(dword) {
        	tmp.byte[0] := SBOX_BYTE(dword, 0)
        	tmp.byte[1] := SBOX_BYTE(dword, 1)
        	tmp.byte[2] := SBOX_BYTE(dword, 2)
        	tmp.byte[3] := SBOX_BYTE(dword, 3)
        	RETURN tmp
        }
        DEFINE L_KEY(dword) {
        	RETURN dword ^ ROL32(dword, 13) ^ ROL32(dword, 23)
        }
        DEFINE T_KEY(dword) {
        	RETURN L_KEY(lower_t(dword))
        }
        DEFINE F_KEY(X0, X1, X2, X3, round_key) {
        	RETURN X0 ^ T_KEY(X1 ^ X2 ^ X3 ^ round_key)
        }
        P.dword[0] := __A.dword[0]
        P.dword[1] := __A.dword[1]
        P.dword[2] := __A.dword[2]
        P.dword[3] := __A.dword[3]
        C.dword[0] := F_KEY(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[0])
        C.dword[1] := F_KEY(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[1])
        C.dword[2] := F_KEY(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[2])
        C.dword[3] := F_KEY(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[3])
        dst.dword[0] := C.dword[0]
        dst.dword[1] := C.dword[1]
        dst.dword[2] := C.dword[2]
        dst.dword[3] := C.dword[3]
        dst[MAX:128] := 0
        

_mm_sm4rnds4_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cryptography
:Header: immintrin.h
:Searchable: AVX_ALL-Cryptography-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __A, 
    __m128i __B
:Param ETypes:
    UI32 __A, 
    UI32 __B

.. code-block:: C

    __m128i _mm_sm4rnds4_epi32(__m128i __A, __m128i __B);

.. admonition:: Intel Description

    This intrinisc performs four rounds of SM4 encryption. The intrinisc operates on independent 128-bit lanes. The calculated results are stored in "dst". 

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        BYTE sbox[256] = {
        0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
        0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
        0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
        0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
        0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
        0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
        0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
        0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
        0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
        0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
        0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
        0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
        0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
        0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
        0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
        0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
        }
        DEFINE ROL32(dword, n) {
        	count := n % 32
        	dest := (dword << count) | (dword >> (32-count))
        	RETURN dest
        }
        DEFINE SBOX_BYTE(dword, i) {
        	RETURN sbox[dword.byte[i]]
        }
        DEFINE lower_t(dword) {
        	tmp.byte[0] := SBOX_BYTE(dword, 0)
        	tmp.byte[1] := SBOX_BYTE(dword, 1)
        	tmp.byte[2] := SBOX_BYTE(dword, 2)
        	tmp.byte[3] := SBOX_BYTE(dword, 3)
        	RETURN tmp
        }
        DEFINE L_RND(dword) {
        	tmp := dword
        	tmp := tmp ^ ROL32(dword, 2)
        	tmp := tmp ^ ROL32(dword, 10)
        	tmp := tmp ^ ROL32(dword, 18)
        	tmp := tmp ^ ROL32(dword, 24)
        	RETURN tmp
        }
        DEFINE T_RND(dword) {
        	RETURN L_RND(lower_t(dword))
        }
        DEFINE F_RND(X0, X1, X2, X3, round_key) {
        	RETURN X0 ^ T_RND(X1 ^ X2 ^ X3 ^ round_key)
        }
        P.dword[0] := __A.dword[0]
        P.dword[1] := __A.dword[1]
        P.dword[2] := __A.dword[2]
        P.dword[3] := __A.dword[3]
        C.dword[0] := F_RND(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[0])
        C.dword[1] := F_RND(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[1])
        C.dword[2] := F_RND(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[2])
        C.dword[3] := F_RND(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[3])
        dst.dword[0] := C.dword[0]
        dst.dword[1] := C.dword[1]
        dst.dword[2] := C.dword[2]
        dst.dword[3] := C.dword[3]
        dst[MAX:128] := 0
        

Move
----
YMM
~~~
_mm256_movehdup_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Move
:Header: immintrin.h
:Searchable: AVX_ALL-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_movehdup_ps(__m256 a);

.. admonition:: Intel Description

    Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] 
        dst[63:32] := a[63:32] 
        dst[95:64] := a[127:96] 
        dst[127:96] := a[127:96]
        dst[159:128] := a[191:160] 
        dst[191:160] := a[191:160] 
        dst[223:192] := a[255:224] 
        dst[255:224] := a[255:224]
        dst[MAX:256] := 0
        	

_mm256_moveldup_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Move
:Header: immintrin.h
:Searchable: AVX_ALL-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_moveldup_ps(__m256 a);

.. admonition:: Intel Description

    Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] 
        dst[63:32] := a[31:0] 
        dst[95:64] := a[95:64] 
        dst[127:96] := a[95:64]
        dst[159:128] := a[159:128] 
        dst[191:160] := a[159:128] 
        dst[223:192] := a[223:192] 
        dst[255:224] := a[223:192]
        dst[MAX:256] := 0
        	

_mm256_movedup_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Move
:Header: immintrin.h
:Searchable: AVX_ALL-Move-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_movedup_pd(__m256d a);

.. admonition:: Intel Description

    Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[127:64] := a[63:0]
        dst[191:128] := a[191:128]
        dst[255:192] := a[191:128]
        dst[MAX:256] := 0
        	

Cast
----
YMM
~~~
_mm256_castpd_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256 _mm256_castpd_ps(__m256d a);

.. admonition:: Intel Description

    Cast vector of type __m256d to type __m256.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castps_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256d _mm256_castps_pd(__m256 a);

.. admonition:: Intel Description

    Cast vector of type __m256 to type __m256d.
    	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castps_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_castps_si256(__m256 a);

.. admonition:: Intel Description

    Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castpd_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256i _mm256_castpd_si256(__m256d a);

.. admonition:: Intel Description

    Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castsi256_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256 _mm256_castsi256_ps(__m256i a);

.. admonition:: Intel Description

    Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castsi256_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256d _mm256_castsi256_pd(__m256i a);

.. admonition:: Intel Description

    Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castps256_ps128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm256_castps256_ps128(__m256 a);

.. admonition:: Intel Description

    Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castpd256_pd128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m128d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm256_castpd256_pd128(__m256d a);

.. admonition:: Intel Description

    Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castsi256_si128
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a
:Param ETypes:
    M128 a

.. code-block:: C

    __m128i _mm256_castsi256_si128(__m256i a);

.. admonition:: Intel Description

    Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castps128_ps256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_castps128_ps256(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castpd128_pd256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_castpd128_pd256(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_castsi128_si256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    M256 a

.. code-block:: C

    __m256i _mm256_castsi128_si256(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_zextps128_ps256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_zextps128_ps256(__m128 a);

.. admonition:: Intel Description

    Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_zextpd128_pd256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_zextpd128_pd256(__m128d a);

.. admonition:: Intel Description

    Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

_mm256_zextsi128_si256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Cast
:Header: immintrin.h
:Searchable: AVX_ALL-Cast-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    M256 a

.. code-block:: C

    __m256i _mm256_zextsi128_si256(__m128i a);

.. admonition:: Intel Description

    Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

General Support
---------------
YMM
~~~
_mm256_zeroall
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: AVX_ALL-General Support-YMM
:Register: YMM 256 bit
:Return Type: void

.. code-block:: C

    void _mm256_zeroall(void );

.. admonition:: Intel Description

    Zero the contents of all XMM or YMM registers.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        YMM0[MAX:0] := 0
        YMM1[MAX:0] := 0
        YMM2[MAX:0] := 0
        YMM3[MAX:0] := 0
        YMM4[MAX:0] := 0
        YMM5[MAX:0] := 0
        YMM6[MAX:0] := 0
        YMM7[MAX:0] := 0
        IF _64_BIT_MODE
        	YMM8[MAX:0] := 0
        	YMM9[MAX:0] := 0
        	YMM10[MAX:0] := 0
        	YMM11[MAX:0] := 0
        	YMM12[MAX:0] := 0
        	YMM13[MAX:0] := 0
        	YMM14[MAX:0] := 0
        	YMM15[MAX:0] := 0
        FI
        	

_mm256_zeroupper
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: AVX_ALL-General Support-YMM
:Register: YMM 256 bit
:Return Type: void

.. code-block:: C

    void _mm256_zeroupper(void );

.. admonition:: Intel Description

    Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        YMM0[MAX:128] := 0
        YMM1[MAX:128] := 0
        YMM2[MAX:128] := 0
        YMM3[MAX:128] := 0
        YMM4[MAX:128] := 0
        YMM5[MAX:128] := 0
        YMM6[MAX:128] := 0
        YMM7[MAX:128] := 0
        IF _64_BIT_MODE
        	YMM8[MAX:128] := 0
        	YMM9[MAX:128] := 0
        	YMM10[MAX:128] := 0
        	YMM11[MAX:128] := 0
        	YMM12[MAX:128] := 0
        	YMM13[MAX:128] := 0
        	YMM14[MAX:128] := 0
        	YMM15[MAX:128] := 0
        FI
        	

_mm256_undefined_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: AVX_ALL-General Support-YMM
:Register: YMM 256 bit
:Return Type: __m256

.. code-block:: C

    __m256 _mm256_undefined_ps(void );

.. admonition:: Intel Description

    Return vector of type __m256 with undefined elements.

_mm256_undefined_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: AVX_ALL-General Support-YMM
:Register: YMM 256 bit
:Return Type: __m256d

.. code-block:: C

    __m256d _mm256_undefined_pd(void );

.. admonition:: Intel Description

    Return vector of type __m256d with undefined elements.

_mm256_undefined_si256
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: General Support
:Header: immintrin.h
:Searchable: AVX_ALL-General Support-YMM
:Register: YMM 256 bit
:Return Type: __m256i

.. code-block:: C

    __m256i _mm256_undefined_si256(void );

.. admonition:: Intel Description

    Return vector of type __m256i with undefined elements.

Probability/Statistics
----------------------
YMM
~~~
_mm256_avg_epu8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: AVX_ALL-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_avg_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_avg_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Probability/Statistics
:Header: immintrin.h
:Searchable: AVX_ALL-Probability/Statistics-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_avg_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
        ENDFOR
        dst[MAX:256] := 0
        	

Special Math Functions
----------------------
YMM
~~~
_mm256_max_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_max_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_max_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_min_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_min_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_round_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    int rounding
:Param ETypes:
    FP64 a, 
    IMM rounding

.. code-block:: C

    __m256d _mm256_round_pd(__m256d a, int rounding);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ROUND(a[i+63:i], rounding)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_round_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    int rounding
:Param ETypes:
    FP32 a, 
    IMM rounding

.. code-block:: C

    __m256 _mm256_round_ps(__m256 a, int rounding);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst".
    	[round_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ROUND(a[i+31:i], rounding)
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_floor_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_floor_ps(__m256 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := FLOOR(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ceil_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_ceil_ps(__m256 a);

.. admonition:: Intel Description

    Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := CEIL(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_floor_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_floor_pd(__m256d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := FLOOR(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_ceil_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_ceil_pd(__m256d a);

.. admonition:: Intel Description

    Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := CEIL(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_abs_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m256i _mm256_abs_epi8(__m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := ABS(a[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_abs_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m256i _mm256_abs_epi16(__m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ABS(a[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_abs_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256i _mm256_abs_epi32(__m256i a);

.. admonition:: Intel Description

    Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ABS(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_max_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_max_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_max_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epu8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_max_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_max_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_max_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_max_epu32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_min_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_min_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_min_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epu8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_min_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epu16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_min_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_min_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Special Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Special Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_min_epu32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

Logical
-------
YMM
~~~
_mm256_and_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_and_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_and_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_and_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_andnot_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_andnot_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_andnot_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_andnot_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_or_pd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_or_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] OR b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_or_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_or_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] OR b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_xor_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_xor_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_xor_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_xor_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_testz_si256
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    int _mm256_testz_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[255:0] AND b[255:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[255:0]) AND b[255:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        RETURN ZF
        	

_mm256_testc_si256
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    int _mm256_testc_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[255:0] AND b[255:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[255:0]) AND b[255:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        RETURN CF
        	

_mm256_testnzc_si256
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    int _mm256_testnzc_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF ((a[255:0] AND b[255:0]) == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        IF (((NOT a[255:0]) AND b[255:0]) == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm256_testz_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm256_testz_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := ZF
        	

_mm256_testc_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm256_testc_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := CF
        	

_mm256_testnzc_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm256_testnzc_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm256_testz_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm256_testz_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := ZF
        	

_mm256_testc_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm256_testc_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := CF
        	

_mm256_testnzc_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm256_testnzc_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[255:0] := a[255:0] AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[255:0] := (NOT a[255:0]) AND b[255:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
            tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm256_and_si256
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    __m256i _mm256_and_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := (a[255:0] AND b[255:0])
        dst[MAX:256] := 0
        	

_mm256_andnot_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    __m256i _mm256_andnot_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise NOT of 256 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := ((NOT a[255:0]) AND b[255:0])
        dst[MAX:256] := 0
        	

_mm256_or_si256
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    __m256i _mm256_or_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise OR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := (a[255:0] OR b[255:0])
        dst[MAX:256] := 0
        	

_mm256_xor_si256
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    M256 a, 
    M256 b

.. code-block:: C

    __m256i _mm256_xor_si256(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the bitwise XOR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := (a[255:0] XOR b[255:0])
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_testz_pd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_testz_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := ZF
        	

_mm_testc_pd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_testc_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := CF
        	

_mm_testnzc_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128d a, 
    __m128d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    int _mm_testnzc_pd(__m128d a, __m128d b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[63] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

_mm_testz_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_testz_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := ZF
        	

_mm_testc_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_testc_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        dst := CF
        	

_mm_testnzc_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Logical
:Header: immintrin.h
:Searchable: AVX_ALL-Logical-XMM
:Register: XMM 128 bit
:Return Type: int
:Param Types:
    __m128 a, 
    __m128 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    int _mm_testnzc_ps(__m128 a, __m128 b);

.. admonition:: Intel Description

    Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := a[127:0] AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	ZF := 1
        ELSE
        	ZF := 0
        FI
        tmp[127:0] := (NOT a[127:0]) AND b[127:0]
        IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
        	CF := 1
        ELSE
        	CF := 0
        FI
        IF (ZF == 0 && CF == 0)
        	dst := 1
        ELSE
        	dst := 0
        FI
        	

Swizzle
-------
YMM
~~~
_mm256_blend_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_blend_pd(__m256d a, __m256d b,
                            const int imm8)

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF imm8[j]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_blend_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_blend_ps(__m256 a, __m256 b, const int imm8);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF imm8[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_blendv_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d mask
:Param ETypes:
    FP64 a, 
    FP64 b, 
    MASK mask

.. code-block:: C

    __m256d _mm256_blendv_pd(__m256d a, __m256d b,
                             __m256d mask)

.. admonition:: Intel Description

    Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_blendv_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 mask
:Param ETypes:
    FP32 a, 
    FP32 b, 
    MASK mask

.. code-block:: C

    __m256 _mm256_blendv_ps(__m256 a, __m256 b, __m256 mask);

.. admonition:: Intel Description

    Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_shuffle_pd(__m256d a, __m256d b,
                              const int imm8)

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
        dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
        dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
        dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
        dst[MAX:256] := 0
        	

_mm256_shuffle_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_shuffle_ps(__m256 a, __m256 b,
                             const int imm8)

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(b[127:0], imm8[5:4])
        dst[127:96] := SELECT4(b[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(b[255:128], imm8[5:4])
        dst[255:224] := SELECT4(b[255:128], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_extractf128_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256 a, 
    const int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm256_extractf128_ps(__m256 a, const int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_extractf128_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m128d
:Param Types:
    __m256d a, 
    const int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm256_extractf128_pd(__m256d a, const int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_extractf128_si256
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_extractf128_si256(__m256i a, const int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_extract_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __int32
:Param Types:
    __m256i a, 
    const int index
:Param ETypes:
    UI32 a, 
    IMM index

.. code-block:: C

    __int32 _mm256_extract_epi32(__m256i a, const int index);

.. admonition:: Intel Description

    Extract a 32-bit integer from "a", selected with "index", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[255:0] >> (index[2:0] * 32))[31:0]
        	

_mm256_extract_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __int64
:Param Types:
    __m256i a, 
    const int index
:Param ETypes:
    UI64 a, 
    IMM index

.. code-block:: C

    __int64 _mm256_extract_epi64(__m256i a, const int index);

.. admonition:: Intel Description

    Extract a 64-bit integer from "a", selected with "index", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[255:0] >> (index[1:0] * 64))[63:0]
        	

_mm256_permutevar_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256i b
:Param ETypes:
    FP32 a, 
    UI32 b

.. code-block:: C

    __m256 _mm256_permutevar_ps(__m256 a, __m256i b);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], b[1:0])
        dst[63:32] := SELECT4(a[127:0], b[33:32])
        dst[95:64] := SELECT4(a[127:0], b[65:64])
        dst[127:96] := SELECT4(a[127:0], b[97:96])
        dst[159:128] := SELECT4(a[255:128], b[129:128])
        dst[191:160] := SELECT4(a[255:128], b[161:160])
        dst[223:192] := SELECT4(a[255:128], b[193:192])
        dst[255:224] := SELECT4(a[255:128], b[225:224])
        dst[MAX:256] := 0
        	

_mm256_permute_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_permute_ps(__m256 a, int imm8);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_permutevar_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256i b
:Param ETypes:
    FP64 a, 
    UI64 b

.. code-block:: C

    __m256d _mm256_permutevar_pd(__m256d a, __m256i b);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) dst[127:64] := a[127:64]; FI
        IF (b[129] == 0) dst[191:128] := a[191:128]; FI
        IF (b[129] == 1) dst[191:128] := a[255:192]; FI
        IF (b[193] == 0) dst[255:192] := a[191:128]; FI
        IF (b[193] == 1) dst[255:192] := a[255:192]; FI
        dst[MAX:256] := 0
        	

_mm256_permute_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_permute_pd(__m256d a, int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
        IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI
        IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI
        IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI
        IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI
        dst[MAX:256] := 0
        	

_mm256_permute2f128_ps
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_permute2f128_ps(__m256 a, __m256 b, int imm8);

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src1, src2, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src1[127:0]
        	1:	tmp[127:0] := src1[255:128]
        	2:	tmp[127:0] := src2[127:0]
        	3:	tmp[127:0] := src2[255:128]
        	ESAC
        	IF control[3]
        		tmp[127:0] := 0
        	FI
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
        dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
        dst[MAX:256] := 0
        	

_mm256_permute2f128_pd
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_permute2f128_pd(__m256d a, __m256d b,
                                   int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src1, src2, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src1[127:0]
        	1:	tmp[127:0] := src1[255:128]
        	2:	tmp[127:0] := src2[127:0]
        	3:	tmp[127:0] := src2[255:128]
        	ESAC
        	IF control[3]
        		tmp[127:0] := 0
        	FI
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
        dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
        dst[MAX:256] := 0
        	

_mm256_permute2f128_si256
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    int imm8
:Param ETypes:
    M256 a, 
    M256 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_permute2f128_si256(__m256i a, __m256i b,
                                      int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src1, src2, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src1[127:0]
        	1:	tmp[127:0] := src1[255:128]
        	2:	tmp[127:0] := src2[127:0]
        	3:	tmp[127:0] := src2[255:128]
        	ESAC
        	IF control[3]
        		tmp[127:0] := 0
        	FI
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
        dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
        dst[MAX:256] := 0
        	

_mm256_insertf128_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m128 b, 
    int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_insertf128_ps(__m256 a, __m128 b, int imm8);

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_insertf128_pd
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m128d b, 
    int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_insertf128_pd(__m256d a, __m128d b,
                                 int imm8)

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE imm8[0] OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_insertf128_si256
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i b, 
    int imm8
:Param ETypes:
    M256 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_insertf128_si256(__m256i a, __m128i b,
                                    int imm8)

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_insert_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __int8 i, 
    const int index
:Param ETypes:
    UI8 a, 
    UI8 i, 
    IMM index

.. code-block:: C

    __m256i _mm256_insert_epi8(__m256i a, __int8 i,
                               const int index)

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 8-bit integer "i" into "dst" at the location specified by "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        sel := index[4:0]*8
        dst[sel+7:sel] := i[7:0]
        	

_mm256_insert_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __int16 i, 
    const int index
:Param ETypes:
    UI16 a, 
    UI16 i, 
    IMM index

.. code-block:: C

    __m256i _mm256_insert_epi16(__m256i a, __int16 i,
                                const int index)

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        sel := index[3:0]*16
        dst[sel+15:sel] := i[15:0]
        	

_mm256_insert_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __int32 i, 
    const int index
:Param ETypes:
    UI32 a, 
    UI32 i, 
    IMM index

.. code-block:: C

    __m256i _mm256_insert_epi32(__m256i a, __int32 i,
                                const int index)

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        sel := index[2:0]*32
        dst[sel+31:sel] := i[31:0]
        	

_mm256_insert_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __int64 i, 
    const int index
:Param ETypes:
    UI64 a, 
    UI64 i, 
    IMM index

.. code-block:: C

    __m256i _mm256_insert_epi64(__m256i a, __int64 i,
                                const int index)

.. admonition:: Intel Description

    Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "index".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        sel := index[1:0]*64
        dst[sel+63:sel] := i[63:0]
        	

_mm256_unpackhi_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_unpackhi_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpackhi_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_unpackhi_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_unpacklo_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_unpacklo_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_broadcast_sd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const * mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_broadcast_sd(double const * mem_addr);

.. admonition:: Intel Description

    Broadcast a double-precision (64-bit) floating-point element from memory to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[63:0] := MEM[mem_addr+63:mem_addr]
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := tmp[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcast_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 const * mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_broadcast_ps(__m128 const * mem_addr);

.. admonition:: Intel Description

    Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := MEM[mem_addr+127:mem_addr]
        dst[127:0] := tmp[127:0]
        dst[255:128] := tmp[127:0]
        dst[MAX:256] := 0
        	

_mm256_broadcast_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d const * mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_broadcast_pd(__m128d const * mem_addr);

.. admonition:: Intel Description

    Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[127:0] := MEM[mem_addr+127:mem_addr]
        dst[127:0] := tmp[127:0]
        dst[255:128] := tmp[127:0]
        dst[MAX:256] := 0
        	

_mm256_extract_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a, 
    const int index
:Param ETypes:
    UI8 a, 
    IMM index

.. code-block:: C

    int _mm256_extract_epi8(__m256i a, const int index);

.. admonition:: Intel Description

    Extract an 8-bit integer from "a", selected with "index", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := (a[255:0] >> (index[4:0] * 8))[7:0]
        	

_mm256_extract_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a, 
    const int index
:Param ETypes:
    UI16 a, 
    IMM index

.. code-block:: C

    int _mm256_extract_epi16(__m256i a, const int index);

.. admonition:: Intel Description

    Extract a 16-bit integer from "a", selected with "index", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a[255:0] >> (index[3:0] * 16))[15:0]
        	

_mm256_blend_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI16 a, 
    UI16 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_blend_epi16(__m256i a, __m256i b,
                               const int imm8)

.. admonition:: Intel Description

    Blend packed 16-bit integers from "a" and "b" within 128-bit lanes using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF imm8[j%8]
        		dst[i+15:i] := b[i+15:i]
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_blend_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_blend_epi32(__m256i a, __m256i b,
                               const int imm8)

.. admonition:: Intel Description

    Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF imm8[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_blendv_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    __m256i mask
:Param ETypes:
    UI8 a, 
    UI8 b, 
    MASK mask

.. code-block:: C

    __m256i _mm256_blendv_epi8(__m256i a, __m256i b,
                               __m256i mask)

.. admonition:: Intel Description

    Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF mask[i+7]
        		dst[i+7:i] := b[i+7:i]
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_broadcastb_epi8(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_broadcastd_epi32(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_broadcastq_epi64(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastsd_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_broadcastsd_pd(__m128d a);

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastsi128_si256
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    M128 a

.. code-block:: C

    __m256i _mm256_broadcastsi128_si256(__m128i a);

.. admonition:: Intel Description

    Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        dst[255:128] := a[127:0]
        dst[MAX:256] := 0
        	

_mm256_broadcastss_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_broadcastss_ps(__m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm256_broadcastw_epi16(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_extracti128_si256
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    M128 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_extracti128_si256(__m256i a, const int imm8);

.. admonition:: Intel Description

    Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        CASE imm8[0] OF
        0: dst[127:0] := a[127:0]
        1: dst[127:0] := a[255:128]
        ESAC
        dst[MAX:128] := 0
        	

_mm256_inserti128_si256
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    M256 a, 
    M128 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_inserti128_si256(__m256i a, __m128i b,
                                    const int imm8)

.. admonition:: Intel Description

    Copy "a" to "dst", then insert 128 bits (composed of integer data) from "b" into "dst" at the location specified by "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := a[255:0]
        CASE (imm8[0]) OF
        0: dst[127:0] := b[127:0]
        1: dst[255:128] := b[127:0]
        ESAC
        dst[MAX:256] := 0
        	

_mm256_permute2x128_si256
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    M256 a, 
    M256 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_permute2x128_si256(__m256i a, __m256i b,
                                      const int imm8)

.. admonition:: Intel Description

    Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src1, src2, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[127:0] := src1[127:0]
        	1:	tmp[127:0] := src1[255:128]
        	2:	tmp[127:0] := src2[127:0]
        	3:	tmp[127:0] := src2[255:128]
        	ESAC
        	IF control[3]
        		tmp[127:0] := 0
        	FI
        	RETURN tmp[127:0]
        }
        dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
        dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
        dst[MAX:256] := 0
        	

_mm256_permute4x64_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI64 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_permute4x64_epi64(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_permute4x64_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    const int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_permute4x64_pd(__m256d a, const int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[63:0] := src[63:0]
        	1:	tmp[63:0] := src[127:64]
        	2:	tmp[63:0] := src[191:128]
        	3:	tmp[63:0] := src[255:192]
        	ESAC
        	RETURN tmp[63:0]
        }
        dst[63:0] := SELECT4(a[255:0], imm8[1:0])
        dst[127:64] := SELECT4(a[255:0], imm8[3:2])
        dst[191:128] := SELECT4(a[255:0], imm8[5:4])
        dst[255:192] := SELECT4(a[255:0], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_permutevar8x32_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i idx
:Param ETypes:
    UI32 a, 
    UI32 idx

.. code-block:: C

    __m256i _mm256_permutevar8x32_epi32(__m256i a, __m256i idx);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_permutevar8x32_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256i idx
:Param ETypes:
    FP32 a, 
    UI32 idx

.. code-block:: C

    __m256 _mm256_permutevar8x32_ps(__m256 a, __m256i idx);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	id := idx[i+2:i]*32
        	dst[i+31:i] := a[id+31:id]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shuffle_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI32 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shuffle_epi32(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        dst[159:128] := SELECT4(a[255:128], imm8[1:0])
        dst[191:160] := SELECT4(a[255:128], imm8[3:2])
        dst[223:192] := SELECT4(a[255:128], imm8[5:4])
        dst[255:224] := SELECT4(a[255:128], imm8[7:6])
        dst[MAX:256] := 0
        	

_mm256_shuffle_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_shuffle_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Shuffle 8-bit integers in "a" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	IF b[i+7] == 1
        		dst[i+7:i] := 0
        	ELSE
        		index[3:0] := b[i+3:i]
        		dst[i+7:i] := a[index*8+7:index*8]
        	FI
        	IF b[128+i+7] == 1
        		dst[128+i+7:128+i] := 0
        	ELSE
        		index[3:0] := b[128+i+3:128+i]
        		dst[128+i+7:128+i] := a[128+index*8+7:128+index*8]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_shufflehi_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shufflehi_epi16(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
        dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
        dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
        dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
        dst[191:128] := a[191:128]
        dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
        dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
        dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
        dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
        dst[MAX:256] := 0
        	

_mm256_shufflelo_epi16
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    const int imm8
:Param ETypes:
    UI16 a, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_shufflelo_epi16(__m256i a, const int imm8);

.. admonition:: Intel Description

    Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
        dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
        dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
        dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
        dst[127:64] := a[127:64]
        dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
        dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
        dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
        dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
        dst[255:192] := a[255:192]
        dst[MAX:256] := 0
        	

_mm256_unpackhi_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_unpackhi_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[71:64] 
        	dst[15:8] := src2[71:64] 
        	dst[23:16] := src1[79:72] 
        	dst[31:24] := src2[79:72] 
        	dst[39:32] := src1[87:80] 
        	dst[47:40] := src2[87:80] 
        	dst[55:48] := src1[95:88] 
        	dst[63:56] := src2[95:88] 
        	dst[71:64] := src1[103:96] 
        	dst[79:72] := src2[103:96] 
        	dst[87:80] := src1[111:104] 
        	dst[95:88] := src2[111:104] 
        	dst[103:96] := src1[119:112] 
        	dst[111:104] := src2[119:112] 
        	dst[119:112] := src1[127:120] 
        	dst[127:120] := src2[127:120] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpackhi_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_unpackhi_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[79:64]
        	dst[31:16] := src2[79:64] 
        	dst[47:32] := src1[95:80] 
        	dst[63:48] := src2[95:80] 
        	dst[79:64] := src1[111:96] 
        	dst[95:80] := src2[111:96] 
        	dst[111:96] := src1[127:112] 
        	dst[127:112] := src2[127:112] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpackhi_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_unpackhi_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[95:64] 
        	dst[63:32] := src2[95:64] 
        	dst[95:64] := src1[127:96] 
        	dst[127:96] := src2[127:96] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpackhi_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_unpackhi_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[127:64] 
        	dst[127:64] := src2[127:64] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_unpacklo_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
        	dst[7:0] := src1[7:0] 
        	dst[15:8] := src2[7:0] 
        	dst[23:16] := src1[15:8] 
        	dst[31:24] := src2[15:8] 
        	dst[39:32] := src1[23:16] 
        	dst[47:40] := src2[23:16] 
        	dst[55:48] := src1[31:24] 
        	dst[63:56] := src2[31:24] 
        	dst[71:64] := src1[39:32]
        	dst[79:72] := src2[39:32] 
        	dst[87:80] := src1[47:40] 
        	dst[95:88] := src2[47:40] 
        	dst[103:96] := src1[55:48] 
        	dst[111:104] := src2[55:48] 
        	dst[119:112] := src1[63:56] 
        	dst[127:120] := src2[63:56] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_epi16
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_unpacklo_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
        	dst[15:0] := src1[15:0] 
        	dst[31:16] := src2[15:0] 
        	dst[47:32] := src1[31:16] 
        	dst[63:48] := src2[31:16] 
        	dst[79:64] := src1[47:32] 
        	dst[95:80] := src2[47:32] 
        	dst[111:96] := src1[63:48] 
        	dst[127:112] := src2[63:48] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_unpacklo_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
        	dst[31:0] := src1[31:0] 
        	dst[63:32] := src2[31:0] 
        	dst[95:64] := src1[63:32] 
        	dst[127:96] := src2[63:32] 
        	RETURN dst[127:0]	
        }
        dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

_mm256_unpacklo_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_unpacklo_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
        	dst[63:0] := src1[63:0] 
        	dst[127:64] := src2[63:0] 
        	RETURN dst[127:0]
        }
        dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
        dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_permutevar_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128i b
:Param ETypes:
    FP32 a, 
    UI32 b

.. code-block:: C

    __m128 _mm_permutevar_ps(__m128 a, __m128i b);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], b[1:0])
        dst[63:32] := SELECT4(a[127:0], b[33:32])
        dst[95:64] := SELECT4(a[127:0], b[65:64])
        dst[127:96] := SELECT4(a[127:0], b[97:96])
        dst[MAX:128] := 0
        	

_mm_permute_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128 _mm_permute_ps(__m128 a, int imm8);

.. admonition:: Intel Description

    Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE SELECT4(src, control) {
        	CASE(control[1:0]) OF
        	0:	tmp[31:0] := src[31:0]
        	1:	tmp[31:0] := src[63:32]
        	2:	tmp[31:0] := src[95:64]
        	3:	tmp[31:0] := src[127:96]
        	ESAC
        	RETURN tmp[31:0]
        }
        dst[31:0] := SELECT4(a[127:0], imm8[1:0])
        dst[63:32] := SELECT4(a[127:0], imm8[3:2])
        dst[95:64] := SELECT4(a[127:0], imm8[5:4])
        dst[127:96] := SELECT4(a[127:0], imm8[7:6])
        dst[MAX:128] := 0
        	

_mm_permutevar_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128i b
:Param ETypes:
    FP64 a, 
    UI64 b

.. code-block:: C

    __m128d _mm_permutevar_pd(__m128d a, __m128i b);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (b[1] == 0) dst[63:0] := a[63:0]; FI
        IF (b[1] == 1) dst[63:0] := a[127:64]; FI
        IF (b[65] == 0) dst[127:64] := a[63:0]; FI
        IF (b[65] == 1) dst[127:64] := a[127:64]; FI
        dst[MAX:128] := 0
        	

_mm_permute_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    int imm8
:Param ETypes:
    FP64 a, 
    IMM imm8

.. code-block:: C

    __m128d _mm_permute_pd(__m128d a, int imm8);

.. admonition:: Intel Description

    Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
        IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
        IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
        IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
        dst[MAX:128] := 0
        	

_mm_broadcast_ss
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const * mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m128 _mm_broadcast_ss(float const * mem_addr);

.. admonition:: Intel Description

    Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := MEM[mem_addr+31:mem_addr]
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := tmp[31:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_blend_epi32
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a, 
    __m128i b, 
    const int imm8
:Param ETypes:
    UI32 a, 
    UI32 b, 
    IMM imm8

.. code-block:: C

    __m128i _mm_blend_epi32(__m128i a, __m128i b,
                            const int imm8)

.. admonition:: Intel Description

    Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF imm8[j]
        		dst[i+31:i] := b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastb_epi8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m128i _mm_broadcastb_epi8(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 8-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastd_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m128i _mm_broadcastd_epi32(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 32-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastq_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m128i _mm_broadcastq_epi64(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 64-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastsd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128d _mm_broadcastsd_pd(__m128d a);

.. admonition:: Intel Description

    Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastsi128_si256
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    M128 a

.. code-block:: C

    __m256i _mm_broadcastsi128_si256(__m128i a);

.. admonition:: Intel Description

    Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := a[127:0]
        dst[255:128] := a[127:0]
        dst[MAX:256] := 0
        	

_mm_broadcastss_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m128 _mm_broadcastss_ps(__m128 a);

.. admonition:: Intel Description

    Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_broadcastw_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Swizzle
:Header: immintrin.h
:Searchable: AVX_ALL-Swizzle-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m128i _mm_broadcastw_epi16(__m128i a);

.. admonition:: Intel Description

    Broadcast the low packed 16-bit integer from "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        dst[MAX:128] := 0
        	

Store
-----
YMM
~~~
_mm256_store_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    double * mem_addr, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm256_store_pd(double * mem_addr, __m256d a);

.. admonition:: Intel Description

    Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_store_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    float * mem_addr, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm256_store_ps(float * mem_addr, __m256 a);

.. admonition:: Intel Description

    Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    double * mem_addr, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm256_storeu_pd(double * mem_addr, __m256d a);

.. admonition:: Intel Description

    Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    float * mem_addr, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm256_storeu_ps(float * mem_addr, __m256 a);

.. admonition:: Intel Description

    Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_store_si256
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __m256i * mem_addr, 
    __m256i a
:Param ETypes:
    M256 mem_addr, 
    M256 a

.. code-block:: C

    void _mm256_store_si256(__m256i * mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits of integer data from "a" into memory.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __m256i * mem_addr, 
    __m256i a
:Param ETypes:
    M256 mem_addr, 
    M256 a

.. code-block:: C

    void _mm256_storeu_si256(__m256i * mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits of integer data from "a" into memory.
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_maskstore_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    double * mem_addr, 
    __m256i mask, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    MASK mask, 
    FP64 a

.. code-block:: C

    void _mm256_maskstore_pd(double* mem_addr, __m256i mask,
                             __m256d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF mask[i+63]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm256_maskstore_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    float * mem_addr, 
    __m256i mask, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    MASK mask, 
    FP32 a

.. code-block:: C

    void _mm256_maskstore_ps(float* mem_addr, __m256i mask,
                             __m256 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF mask[i+31]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_stream_si256
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256i a
:Param ETypes:
    M256 mem_addr, 
    M256 a

.. code-block:: C

    void _mm256_stream_si256(void* mem_addr, __m256i a);

.. admonition:: Intel Description

    Store 256-bits of integer data from "a" into memory using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_stream_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256d a
:Param ETypes:
    FP64 mem_addr, 
    FP64 a

.. code-block:: C

    void _mm256_stream_pd(void* mem_addr, __m256d a);

.. admonition:: Intel Description

    Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_stream_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    void* mem_addr, 
    __m256 a
:Param ETypes:
    FP32 mem_addr, 
    FP32 a

.. code-block:: C

    void _mm256_stream_ps(void* mem_addr, __m256 a);

.. admonition:: Intel Description

    Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[mem_addr+255:mem_addr] := a[255:0]
        	

_mm256_storeu2_m128
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    float* hiaddr, 
    float* loaddr, 
    __m256 a
:Param ETypes:
    FP32 hiaddr, 
    FP32 loaddr, 
    FP32 a

.. code-block:: C

    void _mm256_storeu2_m128(float* hiaddr, float* loaddr,
                             __m256 a)

.. admonition:: Intel Description

    Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory two different 128-bit locations.
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[loaddr+127:loaddr] := a[127:0]
        MEM[hiaddr+127:hiaddr] := a[255:128]
        	

_mm256_storeu2_m128d
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    double* hiaddr, 
    double* loaddr, 
    __m256d a
:Param ETypes:
    FP64 hiaddr, 
    FP64 loaddr, 
    FP64 a

.. code-block:: C

    void _mm256_storeu2_m128d(double* hiaddr, double* loaddr,
                              __m256d a)

.. admonition:: Intel Description

    Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory two different 128-bit locations.
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[loaddr+127:loaddr] := a[127:0]
        MEM[hiaddr+127:hiaddr] := a[255:128]
        	

_mm256_storeu2_m128i
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __m128i* hiaddr, 
    __m128i* loaddr, 
    __m256i a
:Param ETypes:
    M128 hiaddr, 
    M128 loaddr, 
    M128 a

.. code-block:: C

    void _mm256_storeu2_m128i(__m128i* hiaddr, __m128i* loaddr,
                              __m256i a)

.. admonition:: Intel Description

    Store the high and low 128-bit halves (each composed of integer data) from "a" into memory two different 128-bit locations.
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        MEM[loaddr+127:loaddr] := a[127:0]
        MEM[hiaddr+127:hiaddr] := a[255:128]
        	

_mm256_maskstore_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    int* mem_addr, 
    __m256i mask, 
    __m256i a
:Param ETypes:
    UI32 mem_addr, 
    MASK mask, 
    UI32 a

.. code-block:: C

    void _mm256_maskstore_epi32(int* mem_addr, __m256i mask,
                                __m256i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF mask[i+31]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm256_maskstore_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-YMM
:Register: YMM 256 bit
:Return Type: void
:Param Types:
    __int64* mem_addr, 
    __m256i mask, 
    __m256i a
:Param ETypes:
    UI64 mem_addr, 
    MASK mask, 
    UI64 a

.. code-block:: C

    void _mm256_maskstore_epi64(__int64* mem_addr, __m256i mask,
                                __m256i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF mask[i+63]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

XMM
~~~
_mm_maskstore_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    double * mem_addr, 
    __m128i mask, 
    __m128d a
:Param ETypes:
    FP64 mem_addr, 
    MASK mask, 
    FP64 a

.. code-block:: C

    void _mm_maskstore_pd(double* mem_addr, __m128i mask,
                          __m128d a)

.. admonition:: Intel Description

    Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF mask[i+63]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

_mm_maskstore_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    float * mem_addr, 
    __m128i mask, 
    __m128 a
:Param ETypes:
    FP32 mem_addr, 
    MASK mask, 
    FP32 a

.. code-block:: C

    void _mm_maskstore_ps(float* mem_addr, __m128i mask,
                          __m128 a)

.. admonition:: Intel Description

    Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF mask[i+31]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_maskstore_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    int* mem_addr, 
    __m128i mask, 
    __m128i a
:Param ETypes:
    UI32 mem_addr, 
    MASK mask, 
    UI32 a

.. code-block:: C

    void _mm_maskstore_epi32(int* mem_addr, __m128i mask,
                             __m128i a)

.. admonition:: Intel Description

    Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF mask[i+31]
        		MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
        	FI
        ENDFOR
        	

_mm_maskstore_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Store
:Header: immintrin.h
:Searchable: AVX_ALL-Store-XMM
:Register: XMM 128 bit
:Return Type: void
:Param Types:
    __int64* mem_addr, 
    __m128i mask, 
    __m128i a
:Param ETypes:
    UI64 mem_addr, 
    MASK mask, 
    UI64 a

.. code-block:: C

    void _mm_maskstore_epi64(__int64* mem_addr, __m128i mask,
                             __m128i a)

.. admonition:: Intel Description

    Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF mask[i+63]
        		MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
        	FI
        ENDFOR
        	

Load
----
YMM
~~~
_mm256_broadcast_ss
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const * mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_broadcast_ss(float const * mem_addr);

.. admonition:: Intel Description

    Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        tmp[31:0] := MEM[mem_addr+31:mem_addr]
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := tmp[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_load_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const * mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_load_pd(double const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_load_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const * mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_load_ps(float const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const * mem_addr
:Param ETypes:
    FP64 mem_addr

.. code-block:: C

    __m256d _mm256_loadu_pd(double const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const * mem_addr
:Param ETypes:
    FP32 mem_addr

.. code-block:: C

    __m256 _mm256_loadu_ps(float const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_load_si256
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i const * mem_addr
:Param ETypes:
    M256 mem_addr

.. code-block:: C

    __m256i _mm256_load_si256(__m256i const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits of integer data from memory into "dst".
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu_si256
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i const * mem_addr
:Param ETypes:
    M256 mem_addr

.. code-block:: C

    __m256i _mm256_loadu_si256(__m256i const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits of integer data from memory into "dst".
    	"mem_addr" does not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_maskload_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const * mem_addr, 
    __m256i mask
:Param ETypes:
    FP64 mem_addr, 
    MASK mask

.. code-block:: C

    __m256d _mm256_maskload_pd(double const* mem_addr,
                               __m256i mask)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskload_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const * mem_addr, 
    __m256i mask
:Param ETypes:
    FP32 mem_addr, 
    MASK mask

.. code-block:: C

    __m256 _mm256_maskload_ps(float const* mem_addr,
                              __m256i mask)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_lddqu_si256
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i const * mem_addr
:Param ETypes:
    M256 mem_addr

.. code-block:: C

    __m256i _mm256_lddqu_si256(__m256i const * mem_addr);

.. admonition:: Intel Description

    Load 256-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm256_loadu_si256" when the data crosses a cache line boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

_mm256_loadu2_m128
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const* hiaddr, 
    float const* loaddr
:Param ETypes:
    FP32 hiaddr, 
    FP32 loaddr

.. code-block:: C

    __m256 _mm256_loadu2_m128(float const* hiaddr,
                              float const* loaddr)

.. admonition:: Intel Description

    Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst".
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[loaddr+127:loaddr]
        dst[255:128] := MEM[hiaddr+127:hiaddr]
        dst[MAX:256] := 0
        	

_mm256_loadu2_m128d
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const* hiaddr, 
    double const* loaddr
:Param ETypes:
    FP64 hiaddr, 
    FP64 loaddr

.. code-block:: C

    __m256d _mm256_loadu2_m128d(double const* hiaddr,
                                double const* loaddr)

.. admonition:: Intel Description

    Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst".
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[loaddr+127:loaddr]
        dst[255:128] := MEM[hiaddr+127:hiaddr]
        dst[MAX:256] := 0
        	

_mm256_loadu2_m128i
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i const* hiaddr, 
    __m128i const* loaddr
:Param ETypes:
    M128 hiaddr, 
    M128 loaddr

.. code-block:: C

    __m256i _mm256_loadu2_m128i(__m128i const* hiaddr,
                                __m128i const* loaddr)

.. admonition:: Intel Description

    Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value in "dst".
    	"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := MEM[loaddr+127:loaddr]
        dst[255:128] := MEM[hiaddr+127:hiaddr]
        dst[MAX:256] := 0
        	

_mm256_i32gather_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m256d _mm256_i32gather_pd(double const* base_addr,
                                __m128i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_i32gather_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m256 _mm256_i32gather_ps(float const* base_addr,
                               __m256i vindex, const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    int const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m256i _mm256_i32gather_epi32(int const* base_addr,
                                   __m256i vindex,
                                   const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __int64 const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m256i _mm256_i32gather_epi64(__int64 const* base_addr,
                                   __m128i vindex,
                                   const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_i64gather_pd
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m256d _mm256_i64gather_pd(double const* base_addr,
                                __m256i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_i64gather_ps
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    float const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128 _mm256_i64gather_ps(float const* base_addr,
                               __m256i vindex, const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    int const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128i _mm256_i64gather_epi32(int const* base_addr,
                                   __m256i vindex,
                                   const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __int64 const* base_addr, 
    __m256i vindex, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m256i _mm256_i64gather_epi64(__int64 const* base_addr,
                                   __m256i vindex,
                                   const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mask_i32gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    double const* base_addr, 
    __m128i vindex, 
    __m256d mask, 
    const int scale
:Param ETypes:
    FP64 src, 
    FP64 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256d _mm256_mask_i32gather_pd(__m256d src,
                                     double const* base_addr,
                                     __m128i vindex,
                                     __m256d mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF mask[i+63]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_i32gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 src, 
    float const* base_addr, 
    __m256i vindex, 
    __m256 mask, 
    const int scale
:Param ETypes:
    FP32 src, 
    FP32 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256 _mm256_mask_i32gather_ps(__m256 src,
                                    float const* base_addr,
                                    __m256i vindex, __m256 mask,
                                    const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF mask[i+31]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    int const* base_addr, 
    __m256i vindex, 
    __m256i mask, 
    const int scale
:Param ETypes:
    UI32 src, 
    UI32 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mask_i32gather_epi32(__m256i src,
                                        int const* base_addr,
                                        __m256i vindex,
                                        __m256i mask,
                                        const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*32
        	IF mask[i+31]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __int64 const* base_addr, 
    __m128i vindex, 
    __m256i mask, 
    const int scale
:Param ETypes:
    UI64 src, 
    UI64 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mask_i32gather_epi64(
        __m256i src, __int64 const* base_addr, __m128i vindex,
        __m256i mask, const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*32
        	IF mask[i+63]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_i64gather_pd
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d src, 
    double const* base_addr, 
    __m256i vindex, 
    __m256d mask, 
    const int scale
:Param ETypes:
    FP64 src, 
    FP64 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256d _mm256_mask_i64gather_pd(__m256d src,
                                     double const* base_addr,
                                     __m256i vindex,
                                     __m256d mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF mask[i+63]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_mask_i64gather_ps
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    float const* base_addr, 
    __m256i vindex, 
    __m128 mask, 
    const int scale
:Param ETypes:
    FP32 src, 
    FP32 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128 _mm256_mask_i64gather_ps(__m128 src,
                                    float const* base_addr,
                                    __m256i vindex, __m128 mask,
                                    const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF mask[i+31]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm256_mask_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    int const* base_addr, 
    __m256i vindex, 
    __m128i mask, 
    const int scale
:Param ETypes:
    UI32 src, 
    UI32 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128i _mm256_mask_i64gather_epi32(__m128i src,
                                        int const* base_addr,
                                        __m256i vindex,
                                        __m128i mask,
                                        const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	IF mask[i+31]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm256_mask_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __int64 const* base_addr, 
    __m256i vindex, 
    __m256i mask, 
    const int scale
:Param ETypes:
    UI64 src, 
    UI64 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m256i _mm256_mask_i64gather_epi64(
        __m256i src, __int64 const* base_addr, __m256i vindex,
        __m256i mask, const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	m := j*64
        	IF mask[i+63]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:256] := 0
        dst[MAX:256] := 0
        	

_mm256_maskload_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    int const* mem_addr, 
    __m256i mask
:Param ETypes:
    UI32 mem_addr, 
    MASK mask

.. code-block:: C

    __m256i _mm256_maskload_epi32(int const* mem_addr,
                                  __m256i mask)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maskload_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __int64 const* mem_addr, 
    __m256i mask
:Param ETypes:
    UI64 mem_addr, 
    MASK mask

.. code-block:: C

    __m256i _mm256_maskload_epi64(__int64 const* mem_addr,
                                  __m256i mask)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_stream_load_si256
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    void const* mem_addr
:Param ETypes:
    M256 mem_addr

.. code-block:: C

    __m256i _mm256_stream_load_si256(void const* mem_addr);

.. admonition:: Intel Description

    Load 256-bits of integer data from memory into "dst" using a non-temporal memory hint.
    	"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[255:0] := MEM[mem_addr+255:mem_addr]
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_maskload_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const * mem_addr, 
    __m128i mask
:Param ETypes:
    FP64 mem_addr, 
    MASK mask

.. code-block:: C

    __m128d _mm_maskload_pd(double const* mem_addr,
                            __m128i mask)

.. admonition:: Intel Description

    Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskload_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const * mem_addr, 
    __m128i mask
:Param ETypes:
    FP32 mem_addr, 
    MASK mask

.. code-block:: C

    __m128 _mm_maskload_ps(float const* mem_addr, __m128i mask)

.. admonition:: Intel Description

    Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i32gather_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m128d _mm_i32gather_pd(double const* base_addr,
                             __m128i vindex, const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i32gather_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m128 _mm_i32gather_ps(float const* base_addr,
                            __m128i vindex, const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i32gather_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m128i _mm_i32gather_epi32(int const* base_addr,
                                __m128i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i32gather_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI32 vindex, 
    IMM scale

.. code-block:: C

    __m128i _mm_i32gather_epi64(__int64 const* base_addr,
                                __m128i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i64gather_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    double const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    FP64 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128d _mm_i64gather_pd(double const* base_addr,
                             __m128i vindex, const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_i64gather_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    float const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    FP32 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128 _mm_i64gather_ps(float const* base_addr,
                            __m128i vindex, const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_i64gather_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    UI32 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128i _mm_i64gather_epi32(int const* base_addr,
                                __m128i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+31:i] := MEM[addr+31:addr]
        ENDFOR
        dst[MAX:64] := 0
        	

_mm_i64gather_epi64
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 const* base_addr, 
    __m128i vindex, 
    const int scale
:Param ETypes:
    UI64 base_addr, 
    SI64 vindex, 
    IMM scale

.. code-block:: C

    __m128i _mm_i64gather_epi64(__int64 const* base_addr,
                                __m128i vindex,
                                const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        	dst[i+63:i] := MEM[addr+63:addr]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_mask_i32gather_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    double const* base_addr, 
    __m128i vindex, 
    __m128d mask, 
    const int scale
:Param ETypes:
    FP64 src, 
    FP64 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128d _mm_mask_i32gather_pd(__m128d src,
                                  double const* base_addr,
                                  __m128i vindex, __m128d mask,
                                  const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF mask[i+63]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_mask_i32gather_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    float const* base_addr, 
    __m128i vindex, 
    __m128 mask, 
    const int scale
:Param ETypes:
    FP32 src, 
    FP32 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128 _mm_mask_i32gather_ps(__m128 src,
                                 float const* base_addr,
                                 __m128i vindex, __m128 mask,
                                 const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF mask[i+31]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_mask_i32gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    int const* base_addr, 
    __m128i vindex, 
    __m128i mask, 
    const int scale
:Param ETypes:
    UI32 src, 
    UI32 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128i _mm_mask_i32gather_epi32(__m128i src,
                                     int const* base_addr,
                                     __m128i vindex,
                                     __m128i mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*32
        	IF mask[i+31]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_mask_i32gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __int64 const* base_addr, 
    __m128i vindex, 
    __m128i mask, 
    const int scale
:Param ETypes:
    UI64 src, 
    UI64 base_addr, 
    SI32 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128i _mm_mask_i32gather_epi64(__m128i src,
                                     __int64 const* base_addr,
                                     __m128i vindex,
                                     __m128i mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*32
        	IF mask[i+63]
        		addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_mask_i64gather_pd
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d src, 
    double const* base_addr, 
    __m128i vindex, 
    __m128d mask, 
    const int scale
:Param ETypes:
    FP64 src, 
    FP64 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128d _mm_mask_i64gather_pd(__m128d src,
                                  double const* base_addr,
                                  __m128i vindex, __m128d mask,
                                  const int scale)

.. admonition:: Intel Description

    Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF mask[i+63]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_mask_i64gather_ps
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 src, 
    float const* base_addr, 
    __m128i vindex, 
    __m128 mask, 
    const int scale
:Param ETypes:
    FP32 src, 
    FP32 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128 _mm_mask_i64gather_ps(__m128 src,
                                 float const* base_addr,
                                 __m128i vindex, __m128 mask,
                                 const int scale)

.. admonition:: Intel Description

    Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF mask[i+31]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:64] := 0
        dst[MAX:64] := 0
        	

_mm_mask_i64gather_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    int const* base_addr, 
    __m128i vindex, 
    __m128i mask, 
    const int scale
:Param ETypes:
    UI32 src, 
    UI32 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128i _mm_mask_i64gather_epi32(__m128i src,
                                     int const* base_addr,
                                     __m128i vindex,
                                     __m128i mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*32
        	m := j*64
        	IF mask[i+31]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+31:i] := MEM[addr+31:addr]
        	ELSE
        		dst[i+31:i] := src[i+31:i]
        	FI
        ENDFOR
        mask[MAX:64] := 0
        dst[MAX:64] := 0
        	

_mm_mask_i64gather_epi64
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __int64 const* base_addr, 
    __m128i vindex, 
    __m128i mask, 
    const int scale
:Param ETypes:
    UI64 src, 
    UI64 base_addr, 
    SI64 vindex, 
    MASK mask, 
    IMM scale

.. code-block:: C

    __m128i _mm_mask_i64gather_epi64(__m128i src,
                                     __int64 const* base_addr,
                                     __m128i vindex,
                                     __m128i mask,
                                     const int scale)

.. admonition:: Intel Description

    Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	m := j*64
        	IF mask[i+63]
        		addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
        		dst[i+63:i] := MEM[addr+63:addr]
        	ELSE
        		dst[i+63:i] := src[i+63:i]
        	FI
        ENDFOR
        mask[MAX:128] := 0
        dst[MAX:128] := 0
        	

_mm_maskload_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    int const* mem_addr, 
    __m128i mask
:Param ETypes:
    UI32 mem_addr, 
    MASK mask

.. code-block:: C

    __m128i _mm_maskload_epi32(int const* mem_addr,
                               __m128i mask)

.. admonition:: Intel Description

    Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF mask[i+31]
        		dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
        	ELSE
        		dst[i+31:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_maskload_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Load
:Header: immintrin.h
:Searchable: AVX_ALL-Load-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __int64 const* mem_addr, 
    __m128i mask
:Param ETypes:
    UI64 mem_addr, 
    MASK mask

.. code-block:: C

    __m128i _mm_maskload_epi64(__int64 const* mem_addr,
                               __m128i mask)

.. admonition:: Intel Description

    Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF mask[i+63]
        		dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
        	ELSE
        		dst[i+63:i] := 0
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

Elementary Math Functions
-------------------------
YMM
~~~
_mm256_rcp_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_rcp_ps(__m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := 1.0 / a[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_rsqrt_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_rsqrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sqrt_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_sqrt_pd(__m256d a);

.. admonition:: Intel Description

    Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SQRT(a[i+63:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sqrt_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Elementary Math Functions
:Header: immintrin.h
:Searchable: AVX_ALL-Elementary Math Functions-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_sqrt_ps(__m256 a);

.. admonition:: Intel Description

    Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SQRT(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

Arithmetic
----------
YMM
~~~
_mm256_add_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_add_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_add_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_addsub_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_addsub_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF ((j & 1) == 0)
        		dst[i+63:i] := a[i+63:i] - b[i+63:i]
        	ELSE
        		dst[i+63:i] := a[i+63:i] + b[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_addsub_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_addsub_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF ((j & 1) == 0)
        		dst[i+31:i] := a[i+31:i] - b[i+31:i]
        	ELSE
        		dst[i+31:i] := a[i+31:i] + b[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_div_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	dst[i+63:i] := a[i+63:i] / b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_div_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_div_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := a[i+31:i] / b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_dp_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_dp_ps(__m256 a, __m256 b, const int imm8);

.. admonition:: Intel Description

    Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
        	FOR j := 0 to 3
        		i := j*32
        		IF imm8[(4+j)%8]
        			temp[i+31:i] := a[i+31:i] * b[i+31:i]
        		ELSE
        			temp[i+31:i] := FP32(0.0)
        		FI
        	ENDFOR
        	
        	sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0])
        	
        	FOR j := 0 to 3
        		i := j*32
        		IF imm8[j%8]
        			tmpdst[i+31:i] := sum[31:0]
        		ELSE
        			tmpdst[i+31:i] := FP32(0.0)
        		FI
        	ENDFOR
        	RETURN tmpdst[127:0]
        }
        dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
        dst[255:128] := DP(a[255:128], b[255:128], imm8[7:0])
        dst[MAX:256] := 0
        	

_mm256_hadd_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_hadd_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[127:64] + a[63:0]
        dst[127:64] := b[127:64] + b[63:0]
        dst[191:128] := a[255:192] + a[191:128]
        dst[255:192] := b[255:192] + b[191:128]
        dst[MAX:256] := 0
        	

_mm256_hadd_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_hadd_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] + a[31:0]
        dst[63:32] := a[127:96] + a[95:64]
        dst[95:64] := b[63:32] + b[31:0]
        dst[127:96] := b[127:96] + b[95:64]
        dst[159:128] := a[191:160] + a[159:128]
        dst[191:160] := a[255:224] + a[223:192]
        dst[223:192] := b[191:160] + b[159:128]
        dst[255:224] := b[255:224] + b[223:192]
        dst[MAX:256] := 0
        	

_mm256_hsub_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_hsub_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0] - a[127:64]
        dst[127:64] := b[63:0] - b[127:64]
        dst[191:128] := a[191:128] - a[255:192]
        dst[255:192] := b[191:128] - b[255:192]
        dst[MAX:256] := 0
        	

_mm256_hsub_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_hsub_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - a[63:32]
        dst[63:32] := a[95:64] - a[127:96]
        dst[95:64] := b[31:0] - b[63:32]
        dst[127:96] := b[95:64] - b[127:96]
        dst[159:128] := a[159:128] - a[191:160]
        dst[191:160] := a[223:192] - a[255:224]
        dst[223:192] := b[159:128] - b[191:160]
        dst[255:224] := b[223:192] - b[255:224]
        dst[MAX:256] := 0
        	

_mm256_mul_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_mul_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] * b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mul_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_mul_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b
:Param ETypes:
    FP64 a, 
    FP64 b

.. code-block:: C

    __m256d _mm256_sub_pd(__m256d a, __m256d b);

.. admonition:: Intel Description

    Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b
:Param ETypes:
    FP32 a, 
    FP32 b

.. code-block:: C

    __m256 _mm256_sub_ps(__m256 a, __m256 b);

.. admonition:: Intel Description

    Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_add_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := a[i+7:i] + b[i+7:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_add_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := a[i+15:i] + b[i+15:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_add_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 32-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] + b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_add_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_add_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 64-bit integers in "a" and "b", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] + b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_adds_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_adds_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_adds_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_adds_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_adds_epu8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_adds_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_adds_epu16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_adds_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_hadd_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_hadd_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[31:16] + a[15:0]
        dst[31:16] := a[63:48] + a[47:32]
        dst[47:32] := a[95:80] + a[79:64]
        dst[63:48] := a[127:112] + a[111:96]
        dst[79:64] := b[31:16] + b[15:0]
        dst[95:80] := b[63:48] + b[47:32]
        dst[111:96] := b[95:80] + b[79:64]
        dst[127:112] := b[127:112] + b[111:96]
        dst[143:128] := a[159:144] + a[143:128]
        dst[159:144] := a[191:176] + a[175:160]
        dst[175:160] := a[223:208] + a[207:192]
        dst[191:176] := a[255:240] + a[239:224]
        dst[207:192] := b[159:144] + b[143:128]
        dst[223:208] := b[191:176] + b[175:160]
        dst[239:224] := b[223:208] + b[207:192]
        dst[255:240] := b[255:240] + b[239:224]
        dst[MAX:256] := 0
        	

_mm256_hadd_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_hadd_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[63:32] + a[31:0]
        dst[63:32] := a[127:96] + a[95:64]
        dst[95:64] := b[63:32] + b[31:0]
        dst[127:96] := b[127:96] + b[95:64]
        dst[159:128] := a[191:160] + a[159:128]
        dst[191:160] := a[255:224] + a[223:192]
        dst[223:192] := b[191:160] + b[159:128]
        dst[255:224] := b[255:224] + b[223:192]
        dst[MAX:256] := 0
        	

_mm256_hadds_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_hadds_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:16] + a[15:0])
        dst[31:16] := Saturate16(a[63:48] + a[47:32])
        dst[47:32] := Saturate16(a[95:80] + a[79:64])
        dst[63:48] := Saturate16(a[127:112] + a[111:96])
        dst[79:64] := Saturate16(b[31:16] + b[15:0])
        dst[95:80] := Saturate16(b[63:48] + b[47:32])
        dst[111:96] := Saturate16(b[95:80] + b[79:64])
        dst[127:112] := Saturate16(b[127:112] + b[111:96])
        dst[143:128] := Saturate16(a[159:144] + a[143:128])
        dst[159:144] := Saturate16(a[191:176] + a[175:160])
        dst[175:160] := Saturate16(a[223:208] + a[207:192])
        dst[191:176] := Saturate16(a[255:240] + a[239:224])
        dst[207:192] := Saturate16(b[159:144] + b[143:128])
        dst[223:208] := Saturate16(b[191:176] + b[175:160])
        dst[239:224] := Saturate16(b[223:208] + b[207:192])
        dst[255:240] := Saturate16(b[255:240] + b[239:224])
        dst[MAX:256] := 0
        	

_mm256_hsub_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_hsub_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := a[15:0] - a[31:16]
        dst[31:16] := a[47:32] - a[63:48]
        dst[47:32] := a[79:64] - a[95:80]
        dst[63:48] := a[111:96] - a[127:112]
        dst[79:64] := b[15:0] - b[31:16]
        dst[95:80] := b[47:32] - b[63:48]
        dst[111:96] := b[79:64] - b[95:80]
        dst[127:112] := b[111:96] - b[127:112]
        dst[143:128] := a[143:128] - a[159:144]
        dst[159:144] := a[175:160] - a[191:176]
        dst[175:160] := a[207:192] - a[223:208]
        dst[191:176] := a[239:224] - a[255:240]
        dst[207:192] := b[143:128] - b[159:144]
        dst[223:208] := b[175:160] - b[191:176]
        dst[239:224] := b[207:192] - b[223:208]
        dst[255:240] := b[239:224] - b[255:240]
        dst[MAX:256] := 0
        	

_mm256_hsub_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_hsub_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0] - a[63:32]
        dst[63:32] := a[95:64] - a[127:96]
        dst[95:64] := b[31:0] - b[63:32]
        dst[127:96] := b[95:64] - b[127:96]
        dst[159:128] := a[159:128] - a[191:160]
        dst[191:160] := a[223:192] - a[255:224]
        dst[223:192] := b[159:128] - b[191:160]
        dst[255:224] := b[223:192] - b[255:224]
        dst[MAX:256] := 0
        	

_mm256_hsubs_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_hsubs_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[15:0] - a[31:16])
        dst[31:16] := Saturate16(a[47:32] - a[63:48])
        dst[47:32] := Saturate16(a[79:64] - a[95:80])
        dst[63:48] := Saturate16(a[111:96] - a[127:112])
        dst[79:64] := Saturate16(b[15:0] - b[31:16])
        dst[95:80] := Saturate16(b[47:32] - b[63:48])
        dst[111:96] := Saturate16(b[79:64] - b[95:80])
        dst[127:112] := Saturate16(b[111:96] - b[127:112])
        dst[143:128] := Saturate16(a[143:128] - a[159:144])
        dst[159:144] := Saturate16(a[175:160] - a[191:176])
        dst[175:160] := Saturate16(a[207:192] - a[223:208])
        dst[191:176] := Saturate16(a[239:224] - a[255:240])
        dst[207:192] := Saturate16(b[143:128] - b[159:144])
        dst[223:208] := Saturate16(b[175:160] - b[191:176])
        dst[239:224] := Saturate16(b[207:192] - b[223:208])
        dst[255:240] := Saturate16(b[239:224] - b[255:240])
        dst[MAX:256] := 0
        	

_mm256_madd_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_madd_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_maddubs_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_maddubs_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mul_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mul_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mul_epu32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_mul_epu32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+31:i] * b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mulhi_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mulhi_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mulhi_epu16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_mulhi_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := a[i+15:i] * b[i+15:i]
        	dst[i+15:i] := tmp[31:16]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mulhrs_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mulhrs_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
        	dst[i+15:i] := tmp[16:1]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mullo_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_mullo_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
        	dst[i+15:i] := tmp[15:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_mullo_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_mullo_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Multiply the packed signed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	tmp[63:0] := a[i+31:i] * b[i+31:i]
        	dst[i+31:i] := tmp[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sad_epu8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_sad_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
        ENDFOR
        FOR j := 0 to 3
        	i := j*64
        	dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
        	               tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
        	dst[i+63:i+16] := 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sign_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_sign_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Negate packed signed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	IF b[i+7:i] < 0
        		dst[i+7:i] := -(a[i+7:i])
        	ELSE IF b[i+7:i] == 0
        		dst[i+7:i] := 0
        	ELSE
        		dst[i+7:i] := a[i+7:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sign_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_sign_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Negate packed signed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	IF b[i+15:i] < 0
        		dst[i+15:i] := -(a[i+15:i])
        	ELSE IF b[i+15:i] == 0
        		dst[i+15:i] := 0
        	ELSE
        		dst[i+15:i] := a[i+15:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sign_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_sign_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Negate packed signed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF b[i+31:i] < 0
        		dst[i+31:i] := -(a[i+31:i])
        	ELSE IF b[i+31:i] == 0
        		dst[i+31:i] := 0
        	ELSE
        		dst[i+31:i] := a[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_sub_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := a[i+7:i] - b[i+7:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_sub_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := a[i+15:i] - b[i+15:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_sub_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[i+31:i] - b[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_sub_epi64
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_sub_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_subs_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_subs_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_subs_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_subs_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_subs_epu8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_subs_epu8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_subs_epu16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_subs_epu16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_madd52hi_avx_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __X, 
    __m256i __Y, 
    __m256i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m256i _mm256_madd52hi_avx_epu64(__m256i __X, __m256i __Y,
                                      __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_madd52lo_avx_epu64
^^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __X, 
    __m256i __Y, 
    __m256i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m256i _mm256_madd52lo_avx_epu64(__m256i __X, __m256i __Y,
                                      __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_madd52hi_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __X, 
    __m256i __Y, 
    __m256i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m256i _mm256_madd52hi_epu64(__m256i __X, __m256i __Y,
                                  __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_madd52lo_epu64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __X, 
    __m256i __Y, 
    __m256i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m256i _mm256_madd52lo_epu64(__m256i __X, __m256i __Y,
                                  __m256i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpbusd_avx_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusd_avx_epi32(__m256i src, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpbusds_avx_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusds_avx_epi32(__m256i src, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpwssd_avx_epi32
^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssd_avx_epi32(__m256i src, __m256i a,
                                    __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpwssds_avx_epi32
^^^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssds_avx_epi32(__m256i src, __m256i a,
                                     __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpbusd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusd_epi32(__m256i src, __m256i a,
                                __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpbusds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_dpbusds_epi32(__m256i src, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpwssd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssd_epi32(__m256i src, __m256i a,
                                __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpwssds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i src, 
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_dpwssds_epi32(__m256i src, __m256i a,
                                 __m256i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0
        		

_mm256_dpwsud_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI16 __A, 
    UI16 __B

.. code-block:: C

    __m256i _mm256_dpwsud_epi32(__m256i __W, __m256i __A,
                                __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpwsuds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI16 __A, 
    UI16 __B

.. code-block:: C

    __m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_dpwusd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    UI16 __A, 
    SI16 __B

.. code-block:: C

    __m256i _mm256_dpwusd_epi32(__m256i __W, __m256i __A,
                                __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpwusds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    UI16 __A, 
    SI16 __B

.. code-block:: C

    __m256i _mm256_dpwusds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_dpwuud_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    UI32 __W, 
    UI16 __A, 
    UI16 __B

.. code-block:: C

    __m256i _mm256_dpwuud_epi32(__m256i __W, __m256i __A,
                                __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpwuuds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    UI32 __W, 
    UI16 __A, 
    UI16 __B

.. code-block:: C

    __m256i _mm256_dpwuuds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_dpbssd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    SI8 __B

.. code-block:: C

    __m256i _mm256_dpbssd_epi32(__m256i __W, __m256i __A,
                                __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
        	tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
        	tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
        	tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpbssds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    SI8 __B

.. code-block:: C

    __m256i _mm256_dpbssds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
        	tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
        	tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
        	tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_dpbsud_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    UI8 __B

.. code-block:: C

    __m256i _mm256_dpbsud_epi32(__m256i __W, __m256i __A,                            __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
        	tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
        	tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
        	tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpbsuds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    UI8 __B

.. code-block:: C

    __m256i _mm256_dpbsuds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
        	tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
        	tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
        	tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_dpbuud_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    UI8 __A, 
    UI8 __B

.. code-block:: C

    __m256i _mm256_dpbuud_epi32(__m256i __W, __m256i __A,
                                __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
        	tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
        	tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
        	tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_dpbuuds_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i __W, 
    __m256i __A, 
    __m256i __B
:Param ETypes:
    SI32 __W, 
    UI8 __A, 
    UI8 __B

.. code-block:: C

    __m256i _mm256_dpbuuds_epi32(__m256i __W, __m256i __A,
                                 __m256i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with unsigned saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
        	tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
        	tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
        	tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
        	dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:256] := 0			

_mm256_fmadd_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fmadd_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmadd_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fmadd_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmaddsub_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fmaddsub_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF ((j & 1) == 0) 
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmaddsub_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fmaddsub_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF ((j & 1) == 0) 
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsub_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fmsub_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsub_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fmsub_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsubadd_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fmsubadd_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF ((j & 1) == 0) 
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fmsubadd_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fmsubadd_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF ((j & 1) == 0) 
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_fnmadd_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fnmadd_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_fnmadd_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fnmadd_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_fnmsub_pd
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    __m256d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m256d _mm256_fnmsub_pd(__m256d a, __m256d b, __m256d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR	
        dst[MAX:256] := 0
        	

_mm256_fnmsub_ps
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    __m256 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m256 _mm256_fnmsub_ps(__m256 a, __m256 b, __m256 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR	
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_madd52hi_avx_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __X, 
    __m128i __Y, 
    __m128i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m128i _mm_madd52hi_avx_epu64(__m128i __X, __m128i __Y,
                                   __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_madd52lo_avx_epu64
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __X, 
    __m128i __Y, 
    __m128i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m128i _mm_madd52lo_avx_epu64(__m128i __X, __m128i __Y,
                                   __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_madd52hi_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __X, 
    __m128i __Y, 
    __m128i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m128i _mm_madd52hi_epu64(__m128i __X, __m128i __Y,
                               __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_madd52lo_epu64
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __X, 
    __m128i __Y, 
    __m128i __Z
:Param ETypes:
    UI64 __X, 
    UI64 __Y, 
    UI64 __Z

.. code-block:: C

    __m128i _mm_madd52lo_epu64(__m128i __X, __m128i __Y,
                               __m128i __Z)

.. admonition:: Intel Description

    Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
        	dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpbusd_avx_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpbusd_avx_epi32(__m128i src, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpbusds_avx_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_dpbusds_avx_epi32(__m128i src, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpwssd_avx_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssd_avx_epi32(__m128i src, __m128i a,
                                 __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpwssds_avx_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssds_avx_epi32(__m128i src, __m128i a,
                                  __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpbusd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpbusd_epi32(__m128i src, __m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpbusds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    UI8 a, 
    SI8 b

.. code-block:: C

    __m128i _mm_dpbusds_epi32(__m128i src, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
        	tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
        	tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
        	tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpwssd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssd_epi32(__m128i src, __m128i a, __m128i b);

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := src.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpwssds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i src, 
    __m128i a, 
    __m128i b
:Param ETypes:
    SI32 src, 
    SI16 a, 
    SI16 b

.. code-block:: C

    __m128i _mm_dpwssds_epi32(__m128i src, __m128i a,
                              __m128i b)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
        	tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
        	dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0
        		

_mm_dpwsud_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI16 __A, 
    UI16 __B

.. code-block:: C

    __m128i _mm_dpwsud_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpwsuds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI16 __A, 
    UI16 __B

.. code-block:: C

    __m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0			

_mm_dpwusd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    UI16 __A, 
    SI16 __B

.. code-block:: C

    __m128i _mm_dpwusd_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpwusds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    UI16 __A, 
    SI16 __B

.. code-block:: C

    __m128i _mm_dpwusds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0			

_mm_dpwuud_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    UI32 __W, 
    UI16 __A, 
    UI16 __B

.. code-block:: C

    __m128i _mm_dpwuud_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpwuuds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    UI32 __W, 
    UI16 __A, 
    UI16 __B

.. code-block:: C

    __m128i _mm_dpwuuds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
        	tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
        	dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
        ENDFOR
        dst[MAX:128] := 0			

_mm_dpbssd_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    SI8 __B

.. code-block:: C

    __m128i _mm_dpbssd_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
        	tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
        	tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
        	tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpbssds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    SI8 __B

.. code-block:: C

    __m128i _mm_dpbssds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
        	tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
        	tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
        	tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0			

_mm_dpbsud_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    UI8 __B

.. code-block:: C

    __m128i _mm_dpbsud_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
        	tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
        	tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
        	tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpbsuds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    SI8 __A, 
    UI8 __B

.. code-block:: C

    __m128i _mm_dpbsuds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
        	tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
        	tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
        	tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
        	dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0			

_mm_dpbuud_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    UI8 __A, 
    UI8 __B

.. code-block:: C

    __m128i _mm_dpbuud_epi32(__m128i __W, __m128i __A,
                             __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
        	tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
        	tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
        	tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
        	dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
        ENDFOR
        dst[MAX:128] := 0
        

_mm_dpbuuds_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128i __W, 
    __m128i __A, 
    __m128i __B
:Param ETypes:
    SI32 __W, 
    UI8 __A, 
    UI8 __B

.. code-block:: C

    __m128i _mm_dpbuuds_epi32(__m128i __W, __m128i __A,
                              __m128i __B)

.. admonition:: Intel Description

    Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with unsigned saturation, and store the packed 32-bit results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
        	tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
        	tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
        	tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
        	dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
        ENDFOR
        dst[MAX:128] := 0			

_mm_fmadd_pd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmadd_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmadd_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmadd_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmadd_sd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmadd_sd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fmadd_ss
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmadd_ss(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmaddsub_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmaddsub_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF ((j & 1) == 0) 
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmaddsub_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmaddsub_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF ((j & 1) == 0) 
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsub_pd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmsub_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsub_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmsub_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsub_sd
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmsub_sd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fmsub_ss
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmsub_ss(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fmsubadd_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fmsubadd_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	IF ((j & 1) == 0) 
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
        	ELSE
        		dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fmsubadd_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fmsubadd_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	IF ((j & 1) == 0) 
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
        	ELSE
        		dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
        	FI
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_fnmadd_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fnmadd_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_fnmadd_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fnmadd_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_fnmadd_sd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fnmadd_sd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fnmadd_ss
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fnmadd_ss(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

_mm_fnmsub_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fnmsub_pd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_fnmsub_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fnmsub_ps(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
        ENDFOR	
        dst[MAX:128] := 0
        	

_mm_fnmsub_sd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    __m128d c
:Param ETypes:
    FP64 a, 
    FP64 b, 
    FP64 c

.. code-block:: C

    __m128d _mm_fnmsub_sd(__m128d a, __m128d b, __m128d c);

.. admonition:: Intel Description

    Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_fnmsub_ss
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Arithmetic
:Header: immintrin.h
:Searchable: AVX_ALL-Arithmetic-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    __m128 c
:Param ETypes:
    FP32 a, 
    FP32 b, 
    FP32 c

.. code-block:: C

    __m128 _mm_fnmsub_ss(__m128 a, __m128 b, __m128 c);

.. admonition:: Intel Description

    Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

Compare
-------
YMM
~~~
_mm256_cmp_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m256d a, 
    __m256d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m256d _mm256_cmp_pd(__m256d a, __m256d b, const int imm8);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmp_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256 a, 
    __m256 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m256 _mm256_cmp_ps(__m256 a, __m256 b, const int imm8);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpeq_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI8 a, 
    UI8 b

.. code-block:: C

    __m256i _mm256_cmpeq_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpeq_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI16 a, 
    UI16 b

.. code-block:: C

    __m256i _mm256_cmpeq_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpeq_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI32 a, 
    UI32 b

.. code-block:: C

    __m256i _mm256_cmpeq_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpeq_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    UI64 a, 
    UI64 b

.. code-block:: C

    __m256i _mm256_cmpeq_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpgt_epi8
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI8 a, 
    SI8 b

.. code-block:: C

    __m256i _mm256_cmpgt_epi8(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpgt_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_cmpgt_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpgt_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_cmpgt_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cmpgt_epi64
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI64 a, 
    SI64 b

.. code-block:: C

    __m256i _mm256_cmpgt_epi64(__m256i a, __m256i b);

.. admonition:: Intel Description

    Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        dst[MAX:256] := 0
        	

XMM
~~~
_mm_cmp_pd
^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_cmp_pd(__m128d a, __m128d b, const int imm8);

.. admonition:: Intel Description

    Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 1
        	i := j*64
        	dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cmp_ps
^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_cmp_ps(__m128 a, __m128 b, const int imm8);

.. admonition:: Intel Description

    Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        FOR j := 0 to 3
        	i := j*32
        	dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cmp_sd
^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128d
:Param Types:
    __m128d a, 
    __m128d b, 
    const int imm8
:Param ETypes:
    FP64 a, 
    FP64 b, 
    IMM imm8

.. code-block:: C

    __m128d _mm_cmp_sd(__m128d a, __m128d b, const int imm8);

.. admonition:: Intel Description

    Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        dst[63:0] := ( a[63:0] OP b[63:0] ) ? 0xFFFFFFFFFFFFFFFF : 0
        dst[127:64] := a[127:64]
        dst[MAX:128] := 0
        	

_mm_cmp_ss
^^^^^^^^^^
:Tech: AVX_ALL
:Category: Compare
:Header: immintrin.h
:Searchable: AVX_ALL-Compare-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128 a, 
    __m128 b, 
    const int imm8
:Param ETypes:
    FP32 a, 
    FP32 b, 
    IMM imm8

.. code-block:: C

    __m128 _mm_cmp_ss(__m128 a, __m128 b, const int imm8);

.. admonition:: Intel Description

    Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        CASE (imm8[4:0]) OF
        0: OP := _CMP_EQ_OQ
        1: OP := _CMP_LT_OS
        2: OP := _CMP_LE_OS
        3: OP := _CMP_UNORD_Q 
        4: OP := _CMP_NEQ_UQ
        5: OP := _CMP_NLT_US
        6: OP := _CMP_NLE_US
        7: OP := _CMP_ORD_Q
        8: OP := _CMP_EQ_UQ
        9: OP := _CMP_NGE_US
        10: OP := _CMP_NGT_US
        11: OP := _CMP_FALSE_OQ
        12: OP := _CMP_NEQ_OQ
        13: OP := _CMP_GE_OS
        14: OP := _CMP_GT_OS
        15: OP := _CMP_TRUE_UQ
        16: OP := _CMP_EQ_OS
        17: OP := _CMP_LT_OQ
        18: OP := _CMP_LE_OQ
        19: OP := _CMP_UNORD_S
        20: OP := _CMP_NEQ_US
        21: OP := _CMP_NLT_UQ
        22: OP := _CMP_NLE_UQ
        23: OP := _CMP_ORD_S
        24: OP := _CMP_EQ_US
        25: OP := _CMP_NGE_UQ 
        26: OP := _CMP_NGT_UQ 
        27: OP := _CMP_FALSE_OS 
        28: OP := _CMP_NEQ_OS 
        29: OP := _CMP_GE_OQ
        30: OP := _CMP_GT_OQ
        31: OP := _CMP_TRUE_US
        ESAC
        dst[31:0] := ( a[31:0] OP b[31:0] ) ? 0xFFFFFFFF : 0
        dst[127:32] := a[127:32]
        dst[MAX:128] := 0
        	

Set
---
YMM
~~~
_mm256_setzero_pd
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d

.. code-block:: C

    __m256d _mm256_setzero_pd(void );

.. admonition:: Intel Description

    Return vector of type __m256d with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm256_setzero_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256

.. code-block:: C

    __m256 _mm256_setzero_ps(void );

.. admonition:: Intel Description

    Return vector of type __m256 with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm256_setzero_si256
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i

.. code-block:: C

    __m256i _mm256_setzero_si256(void );

.. admonition:: Intel Description

    Return vector of type __m256i with all elements set to zero.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[MAX:0] := 0
        	

_mm256_set_pd
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double e3, 
    double e2, 
    double e1, 
    double e0
:Param ETypes:
    FP64 e3, 
    FP64 e2, 
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m256d _mm256_set_pd(double e3, double e2, double e1,
                          double e0)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        dst[191:128] := e2
        dst[255:192] := e3
        dst[MAX:256] := 0
        	

_mm256_set_ps
^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float e7, 
    float e6, 
    float e5, 
    float e4, 
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e7, 
    FP32 e6, 
    FP32 e5, 
    FP32 e4, 
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m256 _mm256_set_ps(float e7, float e6, float e5, float e4,
                         float e3, float e2, float e1,
                         float e0)

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        dst[159:128] := e4
        dst[191:160] := e5
        dst[223:192] := e6
        dst[255:224] := e7
        dst[MAX:256] := 0
        	

_mm256_set_epi8
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    char e31, 
    char e30, 
    char e29, 
    char e28, 
    char e27, 
    char e26, 
    char e25, 
    char e24, 
    char e23, 
    char e22, 
    char e21, 
    char e20, 
    char e19, 
    char e18, 
    char e17, 
    char e16, 
    char e15, 
    char e14, 
    char e13, 
    char e12, 
    char e11, 
    char e10, 
    char e9, 
    char e8, 
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e31, 
    UI8 e30, 
    UI8 e29, 
    UI8 e28, 
    UI8 e27, 
    UI8 e26, 
    UI8 e25, 
    UI8 e24, 
    UI8 e23, 
    UI8 e22, 
    UI8 e21, 
    UI8 e20, 
    UI8 e19, 
    UI8 e18, 
    UI8 e17, 
    UI8 e16, 
    UI8 e15, 
    UI8 e14, 
    UI8 e13, 
    UI8 e12, 
    UI8 e11, 
    UI8 e10, 
    UI8 e9, 
    UI8 e8, 
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m256i _mm256_set_epi8(
        char e31, char e30, char e29, char e28, char e27,
        char e26, char e25, char e24, char e23, char e22,
        char e21, char e20, char e19, char e18, char e17,
        char e16, char e15, char e14, char e13, char e12,
        char e11, char e10, char e9, char e8, char e7, char e6,
        char e5, char e4, char e3, char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e0
        dst[15:8] := e1
        dst[23:16] := e2
        dst[31:24] := e3
        dst[39:32] := e4
        dst[47:40] := e5
        dst[55:48] := e6
        dst[63:56] := e7
        dst[71:64] := e8
        dst[79:72] := e9
        dst[87:80] := e10
        dst[95:88] := e11
        dst[103:96] := e12
        dst[111:104] := e13
        dst[119:112] := e14
        dst[127:120] := e15
        dst[135:128] := e16
        dst[143:136] := e17
        dst[151:144] := e18
        dst[159:152] := e19
        dst[167:160] := e20
        dst[175:168] := e21
        dst[183:176] := e22
        dst[191:184] := e23
        dst[199:192] := e24
        dst[207:200] := e25
        dst[215:208] := e26
        dst[223:216] := e27
        dst[231:224] := e28
        dst[239:232] := e29
        dst[247:240] := e30
        dst[255:248] := e31
        dst[MAX:256] := 0
        	

_mm256_set_epi16
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    short e15, 
    short e14, 
    short e13, 
    short e12, 
    short e11, 
    short e10, 
    short e9, 
    short e8, 
    short e7, 
    short e6, 
    short e5, 
    short e4, 
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e15, 
    UI16 e14, 
    UI16 e13, 
    UI16 e12, 
    UI16 e11, 
    UI16 e10, 
    UI16 e9, 
    UI16 e8, 
    UI16 e7, 
    UI16 e6, 
    UI16 e5, 
    UI16 e4, 
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m256i _mm256_set_epi16(short e15, short e14, short e13,
                             short e12, short e11, short e10,
                             short e9, short e8, short e7,
                             short e6, short e5, short e4,
                             short e3, short e2, short e1,
                             short e0)

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e0
        dst[31:16] := e1
        dst[47:32] := e2
        dst[63:48] := e3
        dst[79:64] := e4
        dst[95:80] := e5
        dst[111:96] := e6
        dst[127:112] := e7
        dst[143:128] := e8
        dst[159:144] := e9
        dst[175:160] := e10
        dst[191:176] := e11
        dst[207:192] := e12
        dst[223:208] := e13
        dst[239:224] := e14
        dst[255:240] := e15
        dst[MAX:256] := 0
        	

_mm256_set_epi32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    int e7, 
    int e6, 
    int e5, 
    int e4, 
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e7, 
    UI32 e6, 
    UI32 e5, 
    UI32 e4, 
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m256i _mm256_set_epi32(int e7, int e6, int e5, int e4,
                             int e3, int e2, int e1, int e0)

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e0
        dst[63:32] := e1
        dst[95:64] := e2
        dst[127:96] := e3
        dst[159:128] := e4
        dst[191:160] := e5
        dst[223:192] := e6
        dst[255:224] := e7
        dst[MAX:256] := 0
        	

_mm256_set_epi64x
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __int64 e3, 
    __int64 e2, 
    __int64 e1, 
    __int64 e0
:Param ETypes:
    UI64 e3, 
    UI64 e2, 
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m256i _mm256_set_epi64x(__int64 e3, __int64 e2,
                              __int64 e1, __int64 e0)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e0
        dst[127:64] := e1
        dst[191:128] := e2
        dst[255:192] := e3
        dst[MAX:256] := 0
        	

_mm256_setr_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double e3, 
    double e2, 
    double e1, 
    double e0
:Param ETypes:
    FP64 e3, 
    FP64 e2, 
    FP64 e1, 
    FP64 e0

.. code-block:: C

    __m256d _mm256_setr_pd(double e3, double e2, double e1,
                           double e0)

.. admonition:: Intel Description

    Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e3
        dst[127:64] := e2
        dst[191:128] := e1
        dst[255:192] := e0
        dst[MAX:256] := 0
        	

_mm256_setr_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float e7, 
    float e6, 
    float e5, 
    float e4, 
    float e3, 
    float e2, 
    float e1, 
    float e0
:Param ETypes:
    FP32 e7, 
    FP32 e6, 
    FP32 e5, 
    FP32 e4, 
    FP32 e3, 
    FP32 e2, 
    FP32 e1, 
    FP32 e0

.. code-block:: C

    __m256 _mm256_setr_ps(float e7, float e6, float e5,
                          float e4, float e3, float e2,
                          float e1, float e0)

.. admonition:: Intel Description

    Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e7
        dst[63:32] := e6
        dst[95:64] := e5
        dst[127:96] := e4
        dst[159:128] := e3
        dst[191:160] := e2
        dst[223:192] := e1
        dst[255:224] := e0
        dst[MAX:256] := 0
        	

_mm256_setr_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    char e31, 
    char e30, 
    char e29, 
    char e28, 
    char e27, 
    char e26, 
    char e25, 
    char e24, 
    char e23, 
    char e22, 
    char e21, 
    char e20, 
    char e19, 
    char e18, 
    char e17, 
    char e16, 
    char e15, 
    char e14, 
    char e13, 
    char e12, 
    char e11, 
    char e10, 
    char e9, 
    char e8, 
    char e7, 
    char e6, 
    char e5, 
    char e4, 
    char e3, 
    char e2, 
    char e1, 
    char e0
:Param ETypes:
    UI8 e31, 
    UI8 e30, 
    UI8 e29, 
    UI8 e28, 
    UI8 e27, 
    UI8 e26, 
    UI8 e25, 
    UI8 e24, 
    UI8 e23, 
    UI8 e22, 
    UI8 e21, 
    UI8 e20, 
    UI8 e19, 
    UI8 e18, 
    UI8 e17, 
    UI8 e16, 
    UI8 e15, 
    UI8 e14, 
    UI8 e13, 
    UI8 e12, 
    UI8 e11, 
    UI8 e10, 
    UI8 e9, 
    UI8 e8, 
    UI8 e7, 
    UI8 e6, 
    UI8 e5, 
    UI8 e4, 
    UI8 e3, 
    UI8 e2, 
    UI8 e1, 
    UI8 e0

.. code-block:: C

    __m256i _mm256_setr_epi8(
        char e31, char e30, char e29, char e28, char e27,
        char e26, char e25, char e24, char e23, char e22,
        char e21, char e20, char e19, char e18, char e17,
        char e16, char e15, char e14, char e13, char e12,
        char e11, char e10, char e9, char e8, char e7, char e6,
        char e5, char e4, char e3, char e2, char e1, char e0)

.. admonition:: Intel Description

    Set packed 8-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := e31
        dst[15:8] := e30
        dst[23:16] := e29
        dst[31:24] := e28
        dst[39:32] := e27
        dst[47:40] := e26
        dst[55:48] := e25
        dst[63:56] := e24
        dst[71:64] := e23
        dst[79:72] := e22
        dst[87:80] := e21
        dst[95:88] := e20
        dst[103:96] := e19
        dst[111:104] := e18
        dst[119:112] := e17
        dst[127:120] := e16
        dst[135:128] := e15
        dst[143:136] := e14
        dst[151:144] := e13
        dst[159:152] := e12
        dst[167:160] := e11
        dst[175:168] := e10
        dst[183:176] := e9
        dst[191:184] := e8
        dst[199:192] := e7
        dst[207:200] := e6
        dst[215:208] := e5
        dst[223:216] := e4
        dst[231:224] := e3
        dst[239:232] := e2
        dst[247:240] := e1
        dst[255:248] := e0
        dst[MAX:256] := 0
        	

_mm256_setr_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    short e15, 
    short e14, 
    short e13, 
    short e12, 
    short e11, 
    short e10, 
    short e9, 
    short e8, 
    short e7, 
    short e6, 
    short e5, 
    short e4, 
    short e3, 
    short e2, 
    short e1, 
    short e0
:Param ETypes:
    UI16 e15, 
    UI16 e14, 
    UI16 e13, 
    UI16 e12, 
    UI16 e11, 
    UI16 e10, 
    UI16 e9, 
    UI16 e8, 
    UI16 e7, 
    UI16 e6, 
    UI16 e5, 
    UI16 e4, 
    UI16 e3, 
    UI16 e2, 
    UI16 e1, 
    UI16 e0

.. code-block:: C

    __m256i _mm256_setr_epi16(short e15, short e14, short e13,
                              short e12, short e11, short e10,
                              short e9, short e8, short e7,
                              short e6, short e5, short e4,
                              short e3, short e2, short e1,
                              short e0)

.. admonition:: Intel Description

    Set packed 16-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := e15
        dst[31:16] := e14
        dst[47:32] := e13
        dst[63:48] := e12
        dst[79:64] := e11
        dst[95:80] := e10
        dst[111:96] := e9
        dst[127:112] := e8
        dst[143:128] := e7
        dst[159:144] := e6
        dst[175:160] := e5
        dst[191:176] := e4
        dst[207:192] := e3
        dst[223:208] := e2
        dst[239:224] := e1
        dst[255:240] := e0
        dst[MAX:256] := 0
        	

_mm256_setr_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    int e7, 
    int e6, 
    int e5, 
    int e4, 
    int e3, 
    int e2, 
    int e1, 
    int e0
:Param ETypes:
    UI32 e7, 
    UI32 e6, 
    UI32 e5, 
    UI32 e4, 
    UI32 e3, 
    UI32 e2, 
    UI32 e1, 
    UI32 e0

.. code-block:: C

    __m256i _mm256_setr_epi32(int e7, int e6, int e5, int e4,
                              int e3, int e2, int e1, int e0)

.. admonition:: Intel Description

    Set packed 32-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := e7
        dst[63:32] := e6
        dst[95:64] := e5
        dst[127:96] := e4
        dst[159:128] := e3
        dst[191:160] := e2
        dst[223:192] := e1
        dst[255:224] := e0
        dst[MAX:256] := 0
        	

_mm256_setr_epi64x
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __int64 e3, 
    __int64 e2, 
    __int64 e1, 
    __int64 e0
:Param ETypes:
    UI64 e3, 
    UI64 e2, 
    UI64 e1, 
    UI64 e0

.. code-block:: C

    __m256i _mm256_setr_epi64x(__int64 e3, __int64 e2,
                               __int64 e1, __int64 e0)

.. admonition:: Intel Description

    Set packed 64-bit integers in "dst" with the supplied values in reverse order.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := e3
        dst[127:64] := e2
        dst[191:128] := e1
        dst[255:192] := e0
        dst[MAX:256] := 0
        	

_mm256_set1_pd
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    double a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m256d _mm256_set1_pd(double a);

.. admonition:: Intel Description

    Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_ps
^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    float a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256 _mm256_set1_ps(float a);

.. admonition:: Intel Description

    Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_epi8
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    char a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_set1_epi8(char a);

.. admonition:: Intel Description

    Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastb".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[i+7:i] := a[7:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_epi16
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    short a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm256_set1_epi16(short a);

.. admonition:: Intel Description

    Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate the "vpbroadcastw".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*16
        	dst[i+15:i] := a[15:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_epi32
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    int a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_set1_epi32(int a);

.. admonition:: Intel Description

    Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastd".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	dst[i+31:i] := a[31:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set1_epi64x
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    long long a
:Param ETypes:
    UI64 a

.. code-block:: C

    __m256i _mm256_set1_epi64x(long long a);

.. admonition:: Intel Description

    Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	dst[i+63:i] := a[63:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_set_m128
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 hi, 
    __m128 lo
:Param ETypes:
    FP32 hi, 
    FP32 lo

.. code-block:: C

    __m256 _mm256_set_m128(__m128 hi, __m128 lo);

.. admonition:: Intel Description

    Set packed __m256 vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

_mm256_set_m128d
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d hi, 
    __m128d lo
:Param ETypes:
    FP64 hi, 
    FP64 lo

.. code-block:: C

    __m256d _mm256_set_m128d(__m128d hi, __m128d lo);

.. admonition:: Intel Description

    Set packed __m256d vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

_mm256_set_m128i
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i hi, 
    __m128i lo
:Param ETypes:
    M128 hi, 
    M128 lo

.. code-block:: C

    __m256i _mm256_set_m128i(__m128i hi, __m128i lo);

.. admonition:: Intel Description

    Set packed __m256i vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

_mm256_setr_m128
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128 lo, 
    __m128 hi
:Param ETypes:
    FP32 lo, 
    FP32 hi

.. code-block:: C

    __m256 _mm256_setr_m128(__m128 lo, __m128 hi);

.. admonition:: Intel Description

    Set packed __m256 vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

_mm256_setr_m128d
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128d lo, 
    __m128d hi
:Param ETypes:
    FP64 lo, 
    FP64 hi

.. code-block:: C

    __m256d _mm256_setr_m128d(__m128d lo, __m128d hi);

.. admonition:: Intel Description

    Set packed __m256d vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

_mm256_setr_m128i
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Set
:Header: immintrin.h
:Searchable: AVX_ALL-Set-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i lo, 
    __m128i hi
:Param ETypes:
    M128 lo, 
    M128 hi

.. code-block:: C

    __m256i _mm256_setr_m128i(__m128i lo, __m128i hi);

.. admonition:: Intel Description

    Set packed __m256i vector "dst" with the supplied values.

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[127:0] := lo[127:0]
        dst[255:128] := hi[127:0]
        dst[MAX:256] := 0
        	

Convert
-------
YMM
~~~
_mm256_cvtepi32_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256d _mm256_cvtepi32_pd(__m128i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*64
        	dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi32_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m256i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256 _mm256_cvtepi32_ps(__m256i a);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtpd_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128 _mm256_cvtpd_ps(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtps_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvtps_epi32(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtps_pd
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256d
:Param Types:
    __m128 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256d _mm256_cvtps_pd(__m128 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvttpd_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm256_cvttpd_epi32(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvtpd_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    __m128i _mm256_cvtpd_epi32(__m256d a);

.. admonition:: Intel Description

    Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 32*j
        	k := 64*j
        	dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm256_cvttps_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    __m256i _mm256_cvttps_epi32(__m256 a);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtss_f32
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: float
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    float _mm256_cvtss_f32(__m256 a);

.. admonition:: Intel Description

    Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm256_cvtsd_f64
^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: double
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    double _mm256_cvtsd_f64(__m256d a);

.. admonition:: Intel Description

    Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[63:0] := a[63:0]
        	

_mm256_cvtsi256_si32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a
:Param ETypes:
    UI32 a

.. code-block:: C

    int _mm256_cvtsi256_si32(__m256i a);

.. admonition:: Intel Description

    Copy the lower 32-bit integer in "a" to "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[31:0] := a[31:0]
        	

_mm256_cvtepi16_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m256i _mm256_cvtepi16_epi32(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j:= 0 to 7
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := SignExtend32(a[k+15:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi16_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI16 a

.. code-block:: C

    __m256i _mm256_cvtepi16_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j:= 0 to 3
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := SignExtend64(a[k+15:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi32_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI32 a

.. code-block:: C

    __m256i _mm256_cvtepi32_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j:= 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := SignExtend64(a[k+31:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi8_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m256i _mm256_cvtepi8_epi16(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	dst[l+15:l] := SignExtend16(a[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi8_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m256i _mm256_cvtepi8_epi32(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := SignExtend32(a[k+7:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepi8_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    SI8 a

.. code-block:: C

    __m256i _mm256_cvtepi8_epi64(__m128i a);

.. admonition:: Intel Description

    Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := SignExtend64(a[k+7:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu16_epi32
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm256_cvtepu16_epi32(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 16*j
        	dst[i+31:i] := ZeroExtend32(a[k+15:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu16_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI16 a

.. code-block:: C

    __m256i _mm256_cvtepu16_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j:= 0 to 3
        	i := 64*j
        	k := 16*j
        	dst[i+63:i] := ZeroExtend64(a[k+15:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu32_epi64
^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI32 a

.. code-block:: C

    __m256i _mm256_cvtepu32_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j:= 0 to 3
        	i := 64*j
        	k := 32*j
        	dst[i+63:i] := ZeroExtend64(a[k+31:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu8_epi16
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_cvtepu8_epi16(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 15
        	i := j*8
        	l := j*16
        	dst[l+15:l] := ZeroExtend16(a[i+7:i])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu8_epi32
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_cvtepu8_epi32(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 32*j
        	k := 8*j
        	dst[i+31:i] := ZeroExtend32(a[k+7:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtepu8_epi64
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m128i a
:Param ETypes:
    UI8 a

.. code-block:: C

    __m256i _mm256_cvtepu8_epi64(__m128i a);

.. admonition:: Intel Description

    Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 64*j
        	k := 8*j
        	dst[i+63:i] := ZeroExtend64(a[k+7:k])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_bcstnebf16_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const __bf16* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m256 _mm256_bcstnebf16_ps(const __bf16* __A);

.. admonition:: Intel Description

    Convert scalar BF16 (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        b := Convert_BF16_To_FP32(MEM[__A+15:__A])
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := b
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_bcstnesh_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const _Float16* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m256 _mm256_bcstnesh_ps(const _Float16* __A);

.. admonition:: Intel Description

    Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        b := Convert_FP16_To_FP32(MEM[__A+15:__A])
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := b
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cvtneebf16_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const __m256bh* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m256 _mm256_cvtneebf16_ps(const __m256bh* __A);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+15:__A+m])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cvtneeph_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const __m256h* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m256 _mm256_cvtneeph_ps(const __m256h* __A);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+15:__A+m])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cvtneobf16_ps
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const __m256bh* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m256 _mm256_cvtneobf16_ps(const __m256bh* __A);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+31:__A+m+16])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cvtneoph_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    const __m256h* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m256 _mm256_cvtneoph_ps(const __m256h* __A);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	m := j*32
        	dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+31:__A+m+16])
        ENDFOR
        dst[MAX:256] := 0
        

_mm256_cvtneps_avx_pbh
^^^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128bh
:Param Types:
    __m256 __A
:Param ETypes:
    FP32 __A

.. code-block:: C

    __m128bh _mm256_cvtneps_avx_pbh(__m256 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        

_mm256_cvtneps_pbh
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128bh
:Param Types:
    __m256 __A
:Param ETypes:
    FP32 __A

.. code-block:: C

    __m128bh _mm256_cvtneps_pbh(__m256 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        

_mm256_cvtph_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m256
:Param Types:
    __m128i a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m256 _mm256_cvtph_ps(__m128i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_cvtps_ph
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-YMM
:Register: YMM 256 bit
:Return Type: __m128i
:Param Types:
    __m256 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm256_cvtps_ph(__m256 a, int imm8);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := 16*j
        	l := 32*j
        	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        ENDFOR
        dst[MAX:128] := 0
        	

XMM
~~~
_mm_bcstnebf16_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const __bf16* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m128 _mm_bcstnebf16_ps(const __bf16* __A);

.. admonition:: Intel Description

    Convert scalar BF16 (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        b := Convert_BF16_To_FP32(MEM[__A+15:__A])
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := b
        ENDFOR
        dst[MAX:128] := 0
        

_mm_bcstnesh_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const _Float16* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m128 _mm_bcstnesh_ps(const _Float16* __A);

.. admonition:: Intel Description

    Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        b := Convert_FP16_To_FP32(MEM[__A+15:__A])
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := b
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneebf16_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const __m128bh* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m128 _mm_cvtneebf16_ps(const __m128bh* __A);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+15:__A+m])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneeph_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const __m128h* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m128 _mm_cvtneeph_ps(const __m128h* __A);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+15:__A+m])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneobf16_ps
^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const __m128bh* __A
:Param ETypes:
    BF16 __A

.. code-block:: C

    __m128 _mm_cvtneobf16_ps(const __m128bh* __A);

.. admonition:: Intel Description

    Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+31:__A+m+16])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneoph_ps
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    const __m128h* __A
:Param ETypes:
    FP16 __A

.. code-block:: C

    __m128 _mm_cvtneoph_ps(const __m128h* __A);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	m := j*32
        	dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+31:__A+m+16])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneps_avx_pbh
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128 __A
:Param ETypes:
    FP32 __A

.. code-block:: C

    __m128bh _mm_cvtneps_avx_pbh(__m128 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtneps_pbh
^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128bh
:Param Types:
    __m128 __A
:Param ETypes:
    FP32 __A

.. code-block:: C

    __m128bh _mm_cvtneps_pbh(__m128 __A);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
        ENDFOR
        dst[MAX:128] := 0
        

_mm_cvtph_ps
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128
:Param Types:
    __m128i a
:Param ETypes:
    FP16 a

.. code-block:: C

    __m128 _mm_cvtph_ps(__m128i a);

.. admonition:: Intel Description

    Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*32
        	m := j*16
        	dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
        ENDFOR
        dst[MAX:128] := 0
        	

_mm_cvtps_ph
^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Convert
:Header: immintrin.h
:Searchable: AVX_ALL-Convert-XMM
:Register: XMM 128 bit
:Return Type: __m128i
:Param Types:
    __m128 a, 
    int imm8
:Param ETypes:
    FP32 a, 
    IMM imm8

.. code-block:: C

    __m128i _mm_cvtps_ph(__m128 a, int imm8);

.. admonition:: Intel Description

    Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
    	[round_imm_note]

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := 16*j
        	l := 32*j
        	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
        ENDFOR
        dst[MAX:64] := 0
        	

Miscellaneous
-------------
YMM
~~~
_mm256_movemask_pd
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256d a
:Param ETypes:
    FP64 a

.. code-block:: C

    int _mm256_movemask_pd(__m256d a);

.. admonition:: Intel Description

    Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 3
        	i := j*64
        	IF a[i+63]
        		dst[j] := 1
        	ELSE
        		dst[j] := 0
        	FI
        ENDFOR
        dst[MAX:4] := 0
        	

_mm256_movemask_ps
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256 a
:Param ETypes:
    FP32 a

.. code-block:: C

    int _mm256_movemask_ps(__m256 a);

.. admonition:: Intel Description

    Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 7
        	i := j*32
        	IF a[i+31]
        		dst[j] := 1
        	ELSE
        		dst[j] := 0
        	FI
        ENDFOR
        dst[MAX:8] := 0
        	

_mm256_alignr_epi8
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_alignr_epi8(__m256i a, __m256i b,
                               const int imm8)

.. admonition:: Intel Description

    Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 1
        	i := j*128
        	tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
        	dst[i+127:i] := tmp[127:0]
        ENDFOR
        dst[MAX:256] := 0
        	

_mm256_movemask_epi8
^^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: int
:Param Types:
    __m256i a
:Param ETypes:
    UI8 a

.. code-block:: C

    int _mm256_movemask_epi8(__m256i a);

.. admonition:: Intel Description

    Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        FOR j := 0 to 31
        	i := j*8
        	dst[j] := a[i+7]
        ENDFOR
        	

_mm256_mpsadbw_epu8
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b, 
    const int imm8
:Param ETypes:
    UI8 a, 
    UI8 b, 
    IMM imm8

.. code-block:: C

    __m256i _mm256_mpsadbw_epu8(__m256i a, __m256i b,
                                const int imm8)

.. admonition:: Intel Description

    Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
    	Eight SADs are performed for each 128-bit lane using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) {
        	a_offset := imm8[2]*32
        	b_offset := imm8[1:0]*32
        	FOR j := 0 to 7
        		i := j*8
        		k := a_offset+i
        		l := b_offset
        		tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \
        		                   ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24]))
        	ENDFOR
        	RETURN tmp[127:0]
        }
        dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0])
        dst[255:128] := MPSADBW(a[255:128], b[255:128], imm8[5:3])
        dst[MAX:256] := 0
        	

_mm256_packs_epi16
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_packs_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := Saturate8(a[15:0])
        dst[15:8] := Saturate8(a[31:16])
        dst[23:16] := Saturate8(a[47:32])
        dst[31:24] := Saturate8(a[63:48])
        dst[39:32] := Saturate8(a[79:64])
        dst[47:40] := Saturate8(a[95:80])
        dst[55:48] := Saturate8(a[111:96])
        dst[63:56] := Saturate8(a[127:112])
        dst[71:64] := Saturate8(b[15:0])
        dst[79:72] := Saturate8(b[31:16])
        dst[87:80] := Saturate8(b[47:32])
        dst[95:88] := Saturate8(b[63:48])
        dst[103:96] := Saturate8(b[79:64])
        dst[111:104] := Saturate8(b[95:80])
        dst[119:112] := Saturate8(b[111:96])
        dst[127:120] := Saturate8(b[127:112])
        dst[135:128] := Saturate8(a[143:128])
        dst[143:136] := Saturate8(a[159:144])
        dst[151:144] := Saturate8(a[175:160])
        dst[159:152] := Saturate8(a[191:176])
        dst[167:160] := Saturate8(a[207:192])
        dst[175:168] := Saturate8(a[223:208])
        dst[183:176] := Saturate8(a[239:224])
        dst[191:184] := Saturate8(a[255:240])
        dst[199:192] := Saturate8(b[143:128])
        dst[207:200] := Saturate8(b[159:144])
        dst[215:208] := Saturate8(b[175:160])
        dst[223:216] := Saturate8(b[191:176])
        dst[231:224] := Saturate8(b[207:192])
        dst[239:232] := Saturate8(b[223:208])
        dst[247:240] := Saturate8(b[239:224])
        dst[255:248] := Saturate8(b[255:240])
        dst[MAX:256] := 0
        	

_mm256_packs_epi32
^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_packs_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := Saturate16(a[31:0])
        dst[31:16] := Saturate16(a[63:32])
        dst[47:32] := Saturate16(a[95:64])
        dst[63:48] := Saturate16(a[127:96])
        dst[79:64] := Saturate16(b[31:0])
        dst[95:80] := Saturate16(b[63:32])
        dst[111:96] := Saturate16(b[95:64])
        dst[127:112] := Saturate16(b[127:96])
        dst[143:128] := Saturate16(a[159:128])
        dst[159:144] := Saturate16(a[191:160])
        dst[175:160] := Saturate16(a[223:192])
        dst[191:176] := Saturate16(a[255:224])
        dst[207:192] := Saturate16(b[159:128])
        dst[223:208] := Saturate16(b[191:160])
        dst[239:224] := Saturate16(b[223:192])
        dst[255:240] := Saturate16(b[255:224])
        dst[MAX:256] := 0
        	

_mm256_packus_epi16
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI16 a, 
    SI16 b

.. code-block:: C

    __m256i _mm256_packus_epi16(__m256i a, __m256i b);

.. admonition:: Intel Description

    Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[7:0] := SaturateU8(a[15:0])
        dst[15:8] := SaturateU8(a[31:16])
        dst[23:16] := SaturateU8(a[47:32])
        dst[31:24] := SaturateU8(a[63:48])
        dst[39:32] := SaturateU8(a[79:64])
        dst[47:40] := SaturateU8(a[95:80])
        dst[55:48] := SaturateU8(a[111:96])
        dst[63:56] := SaturateU8(a[127:112])
        dst[71:64] := SaturateU8(b[15:0])
        dst[79:72] := SaturateU8(b[31:16])
        dst[87:80] := SaturateU8(b[47:32])
        dst[95:88] := SaturateU8(b[63:48])
        dst[103:96] := SaturateU8(b[79:64])
        dst[111:104] := SaturateU8(b[95:80])
        dst[119:112] := SaturateU8(b[111:96])
        dst[127:120] := SaturateU8(b[127:112])
        dst[135:128] := SaturateU8(a[143:128])
        dst[143:136] := SaturateU8(a[159:144])
        dst[151:144] := SaturateU8(a[175:160])
        dst[159:152] := SaturateU8(a[191:176])
        dst[167:160] := SaturateU8(a[207:192])
        dst[175:168] := SaturateU8(a[223:208])
        dst[183:176] := SaturateU8(a[239:224])
        dst[191:184] := SaturateU8(a[255:240])
        dst[199:192] := SaturateU8(b[143:128])
        dst[207:200] := SaturateU8(b[159:144])
        dst[215:208] := SaturateU8(b[175:160])
        dst[223:216] := SaturateU8(b[191:176])
        dst[231:224] := SaturateU8(b[207:192])
        dst[239:232] := SaturateU8(b[223:208])
        dst[247:240] := SaturateU8(b[239:224])
        dst[255:248] := SaturateU8(b[255:240])
        dst[MAX:256] := 0
        	

_mm256_packus_epi32
^^^^^^^^^^^^^^^^^^^
:Tech: AVX_ALL
:Category: Miscellaneous
:Header: immintrin.h
:Searchable: AVX_ALL-Miscellaneous-YMM
:Register: YMM 256 bit
:Return Type: __m256i
:Param Types:
    __m256i a, 
    __m256i b
:Param ETypes:
    SI32 a, 
    SI32 b

.. code-block:: C

    __m256i _mm256_packus_epi32(__m256i a, __m256i b);

.. admonition:: Intel Description

    Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".

.. admonition:: Intel Implementation Psudeo-Code

    .. code-block:: text

        
        dst[15:0] := SaturateU16(a[31:0])
        dst[31:16] := SaturateU16(a[63:32])
        dst[47:32] := SaturateU16(a[95:64])
        dst[63:48] := SaturateU16(a[127:96])
        dst[79:64] := SaturateU16(b[31:0])
        dst[95:80] := SaturateU16(b[63:32])
        dst[111:96] := SaturateU16(b[95:64])
        dst[127:112] := SaturateU16(b[127:96])
        dst[143:128] := SaturateU16(a[159:128])
        dst[159:144] := SaturateU16(a[191:160])
        dst[175:160] := SaturateU16(a[223:192])
        dst[191:176] := SaturateU16(a[255:224])
        dst[207:192] := SaturateU16(b[159:128])
        dst[223:208] := SaturateU16(b[191:160])
        dst[239:224] := SaturateU16(b[223:192])
        dst[255:240] := SaturateU16(b[255:224])
        dst[MAX:256] := 0