🔬This is a nightly-only experimental API. (
stdsimd
#48556)Available on x86 or x86-64 only.
Expand description
Advanced Vector Extensions (AVX)
The references are:
- Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z. - AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.
Wikipedia provides a quick overview of the instructions available.
Constants
- Equal (ordered, non-signaling)
- Equal (ordered, signaling)
- Equal (unordered, non-signaling)
- Equal (unordered, signaling)
- False (ordered, non-signaling)
- False (ordered, signaling)
- Greater-than-or-equal (ordered, non-signaling)
- Greater-than-or-equal (ordered, signaling)
- Greater-than (ordered, non-signaling)
- Greater-than (ordered, signaling)
- Less-than-or-equal (ordered, non-signaling)
- Less-than-or-equal (ordered, signaling)
- Less-than (ordered, non-signaling)
- Less-than (ordered, signaling)
- Not-equal (ordered, non-signaling)
- Not-equal (ordered, signaling)
- Not-equal (unordered, non-signaling)
- Not-equal (unordered, signaling)
- Not-greater-than-or-equal (unordered, non-signaling)
- Not-greater-than-or-equal (unordered, signaling)
- Not-greater-than (unordered, non-signaling)
- Not-greater-than (unordered, signaling)
- Not-less-than-or-equal (unordered, non-signaling)
- Not-less-than-or-equal (unordered, signaling)
- Not-less-than (unordered, non-signaling)
- Not-less-than (unordered, signaling)
- Ordered (non-signaling)
- Ordered (signaling)
- True (unordered, non-signaling)
- True (unordered, signaling)
- Unordered (non-signaling)
- Unordered (signaling)
Functions
- _mm256_add_pd⚠
avx
Adds packed double-precision (64-bit) floating-point elements ina
andb
. - _mm256_add_ps⚠
avx
Adds packed single-precision (32-bit) floating-point elements ina
andb
. - _mm256_addsub_pd⚠
avx
Alternatively adds and subtracts packed double-precision (64-bit) floating-point elements ina
to/from packed elements inb
. - _mm256_addsub_ps⚠
avx
Alternatively adds and subtracts packed single-precision (32-bit) floating-point elements ina
to/from packed elements inb
. - _mm256_and_pd⚠
avx
Computes the bitwise AND of a packed double-precision (64-bit) floating-point elements ina
andb
. - _mm256_and_ps⚠
avx
Computes the bitwise AND of packed single-precision (32-bit) floating-point elements ina
andb
. - _mm256_andnot_pd⚠
avx
Computes the bitwise NOT of packed double-precision (64-bit) floating-point elements ina
, and then AND withb
. - _mm256_andnot_ps⚠
avx
Computes the bitwise NOT of packed single-precision (32-bit) floating-point elements ina
and then AND withb
. - _mm256_blend_pd⚠
avx
Blends packed double-precision (64-bit) floating-point elements froma
andb
using control maskimm8
. - _mm256_blend_ps⚠
avx
Blends packed single-precision (32-bit) floating-point elements froma
andb
using control maskimm8
. - _mm256_blendv_pd⚠
avx
Blends packed double-precision (64-bit) floating-point elements froma
andb
usingc
as a mask. - _mm256_blendv_ps⚠
avx
Blends packed single-precision (32-bit) floating-point elements froma
andb
usingc
as a mask. - Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
- Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
- Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
- Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
- Casts vector of type __m256d to type __m128d.
- _mm256_castpd_ps⚠
avx
Cast vector of type __m256d to type __m256. - Casts vector of type __m256d to type __m256i.
- Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
- Casts vector of type __m256 to type __m128.
- _mm256_castps_pd⚠
avx
Cast vector of type __m256 to type __m256d. - Casts vector of type __m256 to type __m256i.
- Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
- Casts vector of type __m256i to type __m256d.
- Casts vector of type __m256i to type __m256.
- Casts vector of type __m256i to type __m128i.
- _mm256_ceil_pd⚠
avx
Rounds packed double-precision (64-bit) floating point elements ina
toward positive infinity. - _mm256_ceil_ps⚠
avx
Rounds packed single-precision (32-bit) floating point elements ina
toward positive infinity. - _mm256_cmp_pd⚠
avx
Compares packed double-precision (64-bit) floating-point elements ina
andb
based on the comparison operand specified byIMM5
. - _mm256_cmp_ps⚠
avx
Compares packed single-precision (32-bit) floating-point elements ina
andb
based on the comparison operand specified byIMM5
. - Converts packed 32-bit integers in
a
to packed double-precision (64-bit) floating-point elements. - Converts packed 32-bit integers in
a
to packed single-precision (32-bit) floating-point elements. - Converts packed double-precision (64-bit) floating-point elements in
a
to packed 32-bit integers. - _mm256_cvtpd_ps⚠
avx
Converts packed double-precision (64-bit) floating-point elements ina
to packed single-precision (32-bit) floating-point elements. - Converts packed single-precision (32-bit) floating-point elements in
a
to packed 32-bit integers. - _mm256_cvtps_pd⚠
avx
Converts packed single-precision (32-bit) floating-point elements ina
to packed double-precision (64-bit) floating-point elements. - _mm256_cvtss_f32⚠
avx
Returns the first element of the input vector of[8 x float]
. - Converts packed double-precision (64-bit) floating-point elements in
a
to packed 32-bit integers with truncation. - Converts packed single-precision (32-bit) floating-point elements in
a
to packed 32-bit integers with truncation. - _mm256_div_pd⚠
avx
Computes the division of each of the 4 packed 64-bit floating-point elements ina
by the corresponding packed elements inb
. - _mm256_div_ps⚠
avx
Computes the division of each of the 8 packed 32-bit floating-point elements ina
by the corresponding packed elements inb
. - _mm256_dp_ps⚠
avx
Conditionally multiplies the packed single-precision (32-bit) floating-point elements ina
andb
using the high 4 bits inimm8
, sum the four products, and conditionally return the sum using the low 4 bits ofimm8
. - Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from
a
, selected withimm8
. - Extracts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from
a
, selected withimm8
. - Extracts 128 bits (composed of integer data) from
a
, selected withimm8
. - _mm256_floor_pd⚠
avx
Rounds packed double-precision (64-bit) floating point elements ina
toward negative infinity. - _mm256_floor_ps⚠
avx
Rounds packed single-precision (32-bit) floating point elements ina
toward negative infinity. - _mm256_hadd_pd⚠
avx
Horizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating pointsa
andb
. In the result, sums of elements froma
are returned in even locations, while sums of elements fromb
are returned in odd locations. - _mm256_hadd_ps⚠
avx
Horizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating pointsa
andb
. In the result, sums of elements froma
are returned in locations of indices 0, 1, 4, 5; while sums of elements fromb
are locations 2, 3, 6, 7. - _mm256_hsub_pd⚠
avx
Horizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating pointsa
andb
. In the result, sums of elements froma
are returned in even locations, while sums of elements fromb
are returned in odd locations. - _mm256_hsub_ps⚠
avx
Horizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating pointsa
andb
. In the result, sums of elements froma
are returned in locations of indices 0, 1, 4, 5; while sums of elements fromb
are locations 2, 3, 6, 7. - Copies
a
to result, and inserts the 8-bit integeri
into result at the location specified byindex
. - Copies
a
to result, and inserts the 16-bit integeri
into result at the location specified byindex
. - Copies
a
to result, and inserts the 32-bit integeri
into result at the location specified byindex
. - Copies
a
to result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) fromb
into result at the location specified byimm8
. - Copies
a
to result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) fromb
into result at the location specified byimm8
. - Copies
a
to result, then inserts 128 bits fromb
into result at the location specified byimm8
. - Loads 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than
_mm256_loadu_si256
when the data crosses a cache line boundary. - _mm256_load_pd⚠
avx
Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_load_ps⚠
avx
Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - Loads 256-bits of integer data from memory into result.
mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_loadu2_m128⚠
avx,sse
Loads two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_loadu2_m128d⚠
avx,sse2
Loads two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_loadu2_m128i⚠
avx,sse2
Loads two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_loadu_pd⚠
avx
Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result.mem_addr
does not need to be aligned on any particular boundary. - _mm256_loadu_ps⚠
avx
Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result.mem_addr
does not need to be aligned on any particular boundary. - Loads 256-bits of integer data from memory into result.
mem_addr
does not need to be aligned on any particular boundary. - Loads packed double-precision (64-bit) floating-point elements from memory into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - Loads packed single-precision (32-bit) floating-point elements from memory into result using
mask
(elements are zeroed out when the high bit of the corresponding element is not set). - Stores packed double-precision (64-bit) floating-point elements from
a
into memory usingmask
. - Stores packed single-precision (32-bit) floating-point elements from
a
into memory usingmask
. - _mm256_max_pd⚠
avx
Compares packed double-precision (64-bit) floating-point elements ina
andb
, and returns packed maximum values - _mm256_max_ps⚠
avx
Compares packed single-precision (32-bit) floating-point elements ina
andb
, and returns packed maximum values - _mm256_min_pd⚠
avx
Compares packed double-precision (64-bit) floating-point elements ina
andb
, and returns packed minimum values - _mm256_min_ps⚠
avx
Compares packed single-precision (32-bit) floating-point elements ina
andb
, and returns packed minimum values - Duplicate even-indexed double-precision (64-bit) floating-point elements from
a
, and returns the results. - Duplicate odd-indexed single-precision (32-bit) floating-point elements from
a
, and returns the results. - Duplicate even-indexed single-precision (32-bit) floating-point elements from
a
, and returns the results. - Sets each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in
a
. - Sets each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in
a
. - _mm256_mul_pd⚠
avx
Multiplies packed double-precision (64-bit) floating-point elements ina
andb
. - _mm256_mul_ps⚠
avx
Multiplies packed single-precision (32-bit) floating-point elements ina
andb
. - _mm256_or_pd⚠
avx
Computes the bitwise OR packed double-precision (64-bit) floating-point elements ina
andb
. - _mm256_or_ps⚠
avx
Computes the bitwise OR packed single-precision (32-bit) floating-point elements ina
andb
. - Shuffles 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected by
imm8
froma
andb
. - Shuffles 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected by
imm8
froma
andb
. - Shuffles 128-bits (composed of integer data) selected by
imm8
froma
andb
. - Shuffles double-precision (64-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - Shuffles double-precision (64-bit) floating-point elements in
a
within 256-bit lanes using the control inb
. - Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inb
. - _mm256_rcp_ps⚠
avx
Computes the approximate reciprocal of packed single-precision (32-bit) floating-point elements ina
, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12. - _mm256_round_pd⚠
avx
Rounds packed double-precision (64-bit) floating point elements ina
according to the flagROUNDING
. The value ofROUNDING
may be as follows: - _mm256_round_ps⚠
avx
Rounds packed single-precision (32-bit) floating point elements ina
according to the flagROUNDING
. The value ofROUNDING
may be as follows: - _mm256_rsqrt_ps⚠
avx
Computes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements ina
, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12. - _mm256_set1_epi8⚠
avx
Broadcasts 8-bit integera
to all elements of returned vector. This intrinsic may generate thevpbroadcastb
. - Broadcasts 16-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastw
. - Broadcasts 32-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastd
. - Broadcasts 64-bit integer
a
to all elements of returned vector. This intrinsic may generate thevpbroadcastq
. - _mm256_set1_pd⚠
avx
Broadcasts double-precision (64-bit) floating-point valuea
to all elements of returned vector. - _mm256_set1_ps⚠
avx
Broadcasts single-precision (32-bit) floating-point valuea
to all elements of returned vector. - _mm256_set_epi8⚠
avx
Sets packed 8-bit integers in returned vector with the supplied values. - _mm256_set_epi16⚠
avx
Sets packed 16-bit integers in returned vector with the supplied values. - _mm256_set_epi32⚠
avx
Sets packed 32-bit integers in returned vector with the supplied values. - Sets packed 64-bit integers in returned vector with the supplied values.
- _mm256_set_m128⚠
avx
Sets packed __m256 returned vector with the supplied values. - _mm256_set_m128d⚠
avx
Sets packed __m256d returned vector with the supplied values. - _mm256_set_m128i⚠
avx
Sets packed __m256i returned vector with the supplied values. - _mm256_set_pd⚠
avx
Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values. - _mm256_set_ps⚠
avx
Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values. - _mm256_setr_epi8⚠
avx
Sets packed 8-bit integers in returned vector with the supplied values in reverse order. - Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
- Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
- Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_m128⚠
avx
Sets packed __m256 returned vector with the supplied values. - Sets packed __m256d returned vector with the supplied values.
- Sets packed __m256i returned vector with the supplied values.
- _mm256_setr_pd⚠
avx
Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order. - _mm256_setr_ps⚠
avx
Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order. - Returns vector of type __m256d with all elements set to zero.
- Returns vector of type __m256 with all elements set to zero.
- Returns vector of type __m256i with all elements set to zero.
- Shuffles double-precision (64-bit) floating-point elements within 128-bit lanes using the control in
imm8
. - Shuffles single-precision (32-bit) floating-point elements in
a
within 128-bit lanes using the control inimm8
. - _mm256_sqrt_pd⚠
avx
Returns the square root of packed double-precision (64-bit) floating point elements ina
. - _mm256_sqrt_ps⚠
avx
Returns the square root of packed single-precision (32-bit) floating point elements ina
. - _mm256_store_pd⚠
avx
Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) froma
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_store_ps⚠
avx
Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) froma
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - Stores 256-bits of integer data from
a
into memory.mem_addr
must be aligned on a 32-byte boundary or a general-protection exception may be generated. - _mm256_storeu2_m128⚠
avx,sse
Stores the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) froma
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_storeu2_m128d⚠
avx,sse2
Stores the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) froma
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_storeu2_m128i⚠
avx,sse2
Stores the high and low 128-bit halves (each composed of integer data) froma
into memory two different 128-bit locations.hiaddr
andloaddr
do not need to be aligned on any particular boundary. - _mm256_storeu_pd⚠
avx
Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) froma
into memory.mem_addr
does not need to be aligned on any particular boundary. - _mm256_storeu_ps⚠
avx
Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) froma
into memory.mem_addr
does not need to be aligned on any particular boundary. - Stores 256-bits of integer data from
a
into memory.mem_addr
does not need to be aligned on any particular boundary. - _mm256_stream_pd⚠
avx
Moves double-precision values from a 256-bit vector of[4 x double]
to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). - _mm256_stream_ps⚠
avx
Moves single-precision floating point values from a 256-bit vector of[8 x float]
to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). - Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_sub_pd⚠
avx
Subtracts packed double-precision (64-bit) floating-point elements inb
from packed elements ina
. - _mm256_sub_ps⚠
avx
Subtracts packed single-precision (32-bit) floating-point elements inb
from packed elements ina
. - _mm256_testc_pd⚠
avx
Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) ina
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm256_testc_ps⚠
avx
Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) ina
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return theCF
value. - Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in
a
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm256_testz_pd⚠
avx
Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) ina
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - _mm256_testz_ps⚠
avx
Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) ina
andb
, producing an intermediate 256-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - Computes the bitwise AND of 256 bits (representing integer data) in
a
andb
, and setZF
to 1 if the result is zero, otherwise setZF
to 0. Computes the bitwise NOT ofa
and then AND withb
, and setCF
to 1 if the result is zero, otherwise setCF
to 0. Return theZF
value. - Returns vector of type
__m256d
with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit
. In practice, this is equivalent tomem::zeroed
. - Returns vector of type
__m256
with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit
. In practice, this is equivalent tomem::zeroed
. - Returns vector of type __m256i with with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to
mem::MaybeUninit
. In practice, this is equivalent tomem::zeroed
. - Unpacks and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in
a
andb
. - Unpacks and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in
a
andb
. - Unpacks and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in
a
andb
. - Unpacks and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in
a
andb
. - _mm256_xor_pd⚠
avx
Computes the bitwise XOR of packed double-precision (64-bit) floating-point elements ina
andb
. - _mm256_xor_ps⚠
avx
Computes the bitwise XOR of packed single-precision (32-bit) floating-point elements ina
andb
. - _mm256_zeroall⚠
avx
Zeroes the contents of all XMM or YMM registers. - _mm256_zeroupper⚠
avx
Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. - _mm256_zextpd128_pd256⚠
avx,sse2
Constructs a 256-bit floating-point vector of[4 x double]
from a 128-bit floating-point vector of[2 x double]
. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. - _mm256_zextps128_ps256⚠
avx,sse
Constructs a 256-bit floating-point vector of[8 x float]
from a 128-bit floating-point vector of[4 x float]
. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. - _mm256_zextsi128_si256⚠
avx,sse2
Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. - _mm_broadcast_ss⚠
avx
Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. - _mm_cmp_pd⚠
avx,sse2
Compares packed double-precision (64-bit) floating-point elements ina
andb
based on the comparison operand specified byIMM5
. - _mm_cmp_ps⚠
avx,sse
Compares packed single-precision (32-bit) floating-point elements ina
andb
based on the comparison operand specified byIMM5
. - _mm_cmp_sd⚠
avx,sse2
Compares the lower double-precision (64-bit) floating-point element ina
andb
based on the comparison operand specified byIMM5
, store the result in the lower element of returned vector, and copies the upper element froma
to the upper element of returned vector. - _mm_cmp_ss⚠
avx,sse
Compares the lower single-precision (32-bit) floating-point element ina
andb
based on the comparison operand specified byIMM5
, store the result in the lower element of returned vector, and copies the upper 3 packed elements froma
to the upper elements of returned vector. - _mm_maskload_pd⚠
avx
Loads packed double-precision (64-bit) floating-point elements from memory into result usingmask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm_maskload_ps⚠
avx
Loads packed single-precision (32-bit) floating-point elements from memory into result usingmask
(elements are zeroed out when the high bit of the corresponding element is not set). - _mm_maskstore_pd⚠
avx
Stores packed double-precision (64-bit) floating-point elements froma
into memory usingmask
. - _mm_maskstore_ps⚠
avx
Stores packed single-precision (32-bit) floating-point elements froma
into memory usingmask
. - _mm_permute_pd⚠
avx,sse2
Shuffles double-precision (64-bit) floating-point elements ina
using the control inimm8
. - _mm_permute_ps⚠
avx,sse
Shuffles single-precision (32-bit) floating-point elements ina
using the control inimm8
. - Shuffles double-precision (64-bit) floating-point elements in
a
using the control inb
. - Shuffles single-precision (32-bit) floating-point elements in
a
using the control inb
. - _mm_testc_pd⚠
avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm_testc_ps⚠
avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theCF
value. - _mm_testnzc_pd⚠
avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm_testnzc_ps⚠
avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return 1 if both theZF
andCF
values are zero, otherwise return 0. - _mm_testz_pd⚠
avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value. - _mm_testz_ps⚠
avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) ina
andb
, producing an intermediate 128-bit value, and setZF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZF
to 0. Compute the bitwise NOT ofa
and then AND withb
, producing an intermediate value, and setCF
to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCF
to 0. Return theZF
value.