Module core::core_arch::x86::avx

source ·

🔬This is a nightly-only experimental API. (stdsimd #48556)

Available on x86 or x86-64 only.

Expand description

Advanced Vector Extensions (AVX)

The references are:

Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z. - AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.

Wikipedia provides a quick overview of the instructions available.

Constants

_CMP_EQ_OQ
Equal (ordered, non-signaling)
_CMP_EQ_OS
Equal (ordered, signaling)
_CMP_EQ_UQ
Equal (unordered, non-signaling)
_CMP_EQ_US
Equal (unordered, signaling)
_CMP_FALSE_OQ
False (ordered, non-signaling)
_CMP_FALSE_OS
False (ordered, signaling)
_CMP_GE_OQ
Greater-than-or-equal (ordered, non-signaling)
_CMP_GE_OS
Greater-than-or-equal (ordered, signaling)
_CMP_GT_OQ
Greater-than (ordered, non-signaling)
_CMP_GT_OS
Greater-than (ordered, signaling)
_CMP_LE_OQ
Less-than-or-equal (ordered, non-signaling)
_CMP_LE_OS
Less-than-or-equal (ordered, signaling)
_CMP_LT_OQ
Less-than (ordered, non-signaling)
_CMP_LT_OS
Less-than (ordered, signaling)
_CMP_NEQ_OQ
Not-equal (ordered, non-signaling)
_CMP_NEQ_OS
Not-equal (ordered, signaling)
_CMP_NEQ_UQ
Not-equal (unordered, non-signaling)
_CMP_NEQ_US
Not-equal (unordered, signaling)
_CMP_NGE_UQ
Not-greater-than-or-equal (unordered, non-signaling)
_CMP_NGE_US
Not-greater-than-or-equal (unordered, signaling)
_CMP_NGT_UQ
Not-greater-than (unordered, non-signaling)
_CMP_NGT_US
Not-greater-than (unordered, signaling)
_CMP_NLE_UQ
Not-less-than-or-equal (unordered, non-signaling)
_CMP_NLE_US
Not-less-than-or-equal (unordered, signaling)
_CMP_NLT_UQ
Not-less-than (unordered, non-signaling)
_CMP_NLT_US
Not-less-than (unordered, signaling)
_CMP_ORD_Q
Ordered (non-signaling)
_CMP_ORD_S
Ordered (signaling)
_CMP_TRUE_UQ
True (unordered, non-signaling)
_CMP_TRUE_US
True (unordered, signaling)
_CMP_UNORD_Q
Unordered (non-signaling)
_CMP_UNORD_S
Unordered (signaling)

Functions

addsubpd256 🔒 ^⚠Experimental
addsubps256 🔒 ^⚠Experimental
maskloadpd 🔒 ^⚠Experimental
maskloadpd256 🔒 ^⚠Experimental
maskloadps 🔒 ^⚠Experimental
maskloadps256 🔒 ^⚠Experimental
maskstorepd 🔒 ^⚠Experimental
maskstorepd256 🔒 ^⚠Experimental
maskstoreps 🔒 ^⚠Experimental
maskstoreps256 🔒 ^⚠Experimental
movmskpd256 🔒 ^⚠Experimental
movmskps256 🔒 ^⚠Experimental
ptestc256 🔒 ^⚠Experimental
ptestnzc256 🔒 ^⚠Experimental
ptestz256 🔒 ^⚠Experimental
roundpd256 🔒 ^⚠Experimental
roundps256 🔒 ^⚠Experimental
sqrtps256 🔒 ^⚠Experimental
storeudq256 🔒 ^⚠Experimental
storeupd256 🔒 ^⚠Experimental
storeups256 🔒 ^⚠Experimental
vblendvpd 🔒 ^⚠Experimental
vblendvps 🔒 ^⚠Experimental
vbroadcastf128pd256 🔒 ^⚠Experimental
vbroadcastf128ps256 🔒 ^⚠Experimental
vcmppd 🔒 ^⚠Experimental
vcmppd256 🔒 ^⚠Experimental
vcmpps 🔒 ^⚠Experimental
vcmpps256 🔒 ^⚠Experimental
vcmpsd 🔒 ^⚠Experimental
vcmpss 🔒 ^⚠Experimental
vcvtdq2ps 🔒 ^⚠Experimental
vcvtpd2dq 🔒 ^⚠Experimental
vcvtpd2ps 🔒 ^⚠Experimental
vcvtps2dq 🔒 ^⚠Experimental
vcvttpd2dq 🔒 ^⚠Experimental
vcvttps2dq 🔒 ^⚠Experimental
vdpps 🔒 ^⚠Experimental
vhaddpd 🔒 ^⚠Experimental
vhaddps 🔒 ^⚠Experimental
vhsubpd 🔒 ^⚠Experimental
vhsubps 🔒 ^⚠Experimental
vlddqu 🔒 ^⚠Experimental
vmaxpd 🔒 ^⚠Experimental
vmaxps 🔒 ^⚠Experimental
vminpd 🔒 ^⚠Experimental
vminps 🔒 ^⚠Experimental
vperm2f128pd256 🔒 ^⚠Experimental
vperm2f128ps256 🔒 ^⚠Experimental
vperm2f128si256 🔒 ^⚠Experimental
vpermilpd 🔒 ^⚠Experimental
vpermilpd256 🔒 ^⚠Experimental
vpermilps 🔒 ^⚠Experimental
vpermilps256 🔒 ^⚠Experimental
vrcpps 🔒 ^⚠Experimental
vrsqrtps 🔒 ^⚠Experimental
vtestcpd 🔒 ^⚠Experimental
vtestcpd256 🔒 ^⚠Experimental
vtestcps 🔒 ^⚠Experimental
vtestcps256 🔒 ^⚠Experimental
vtestnzcpd 🔒 ^⚠Experimental
vtestnzcpd256 🔒 ^⚠Experimental
vtestnzcps 🔒 ^⚠Experimental
vtestnzcps256 🔒 ^⚠Experimental
vtestzpd 🔒 ^⚠Experimental
vtestzpd256 🔒 ^⚠Experimental
vtestzps 🔒 ^⚠Experimental
vtestzps256 🔒 ^⚠Experimental
vzeroall 🔒 ^⚠Experimental
vzeroupper 🔒 ^⚠Experimental
_mm256_add_pd^⚠avx
Adds packed double-precision (64-bit) floating-point elements in a and b.
_mm256_add_ps^⚠avx
Adds packed single-precision (32-bit) floating-point elements in a and b.
_mm256_addsub_pd^⚠avx
Alternatively adds and subtracts packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.
_mm256_addsub_ps^⚠avx
Alternatively adds and subtracts packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.
_mm256_and_pd^⚠avx
Computes the bitwise AND of a packed double-precision (64-bit) floating-point elements in a and b.
_mm256_and_ps^⚠avx
Computes the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.
_mm256_andnot_pd^⚠avx
Computes the bitwise NOT of packed double-precision (64-bit) floating-point elements in a, and then AND with b.
_mm256_andnot_ps^⚠avx
Computes the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.
_mm256_blend_pd^⚠avx
Blends packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.
_mm256_blend_ps^⚠avx
Blends packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.
_mm256_blendv_pd^⚠avx
Blends packed double-precision (64-bit) floating-point elements from a and b using c as a mask.
_mm256_blendv_ps^⚠avx
Blends packed single-precision (32-bit) floating-point elements from a and b using c as a mask.
_mm256_broadcast_pd^⚠avx
Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_ps^⚠avx
Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
_mm256_broadcast_sd^⚠avx
Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
_mm256_broadcast_ss^⚠avx
Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm256_castpd128_pd256^⚠avx
Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
_mm256_castpd256_pd128^⚠avx
Casts vector of type __m256d to type __m128d.
_mm256_castpd_ps^⚠avx
Cast vector of type __m256d to type __m256.
_mm256_castpd_si256^⚠avx
Casts vector of type __m256d to type __m256i.
_mm256_castps128_ps256^⚠avx
Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
_mm256_castps256_ps128^⚠avx
Casts vector of type __m256 to type __m128.
_mm256_castps_pd^⚠avx
Cast vector of type __m256 to type __m256d.
_mm256_castps_si256^⚠avx
Casts vector of type __m256 to type __m256i.
_mm256_castsi128_si256^⚠avx
Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
_mm256_castsi256_pd^⚠avx
Casts vector of type __m256i to type __m256d.
_mm256_castsi256_ps^⚠avx
Casts vector of type __m256i to type __m256.
_mm256_castsi256_si128^⚠avx
Casts vector of type __m256i to type __m128i.
_mm256_ceil_pd^⚠avx
Rounds packed double-precision (64-bit) floating point elements in a toward positive infinity.
_mm256_ceil_ps^⚠avx
Rounds packed single-precision (32-bit) floating point elements in a toward positive infinity.
_mm256_cmp_pd^⚠avx
Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm256_cmp_ps^⚠avx
Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm256_cvtepi32_pd^⚠avx
Converts packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements.
_mm256_cvtepi32_ps^⚠avx
Converts packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements.
_mm256_cvtpd_epi32^⚠avx
Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers.
_mm256_cvtpd_ps^⚠avx
Converts packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements.
_mm256_cvtps_epi32^⚠avx
Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.
_mm256_cvtps_pd^⚠avx
Converts packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements.
_mm256_cvtss_f32^⚠avx
Returns the first element of the input vector of [8 x float].
_mm256_cvttpd_epi32^⚠avx
Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation.
_mm256_cvttps_epi32^⚠avx
Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation.
_mm256_div_pd^⚠avx
Computes the division of each of the 4 packed 64-bit floating-point elements in a by the corresponding packed elements in b.
_mm256_div_ps^⚠avx
Computes the division of each of the 8 packed 32-bit floating-point elements in a by the corresponding packed elements in b.
_mm256_dp_ps^⚠avx
Conditionally multiplies the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally return the sum using the low 4 bits of imm8.
_mm256_extractf128_pd^⚠avx
Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8.
_mm256_extractf128_ps^⚠avx
Extracts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8.
_mm256_extractf128_si256^⚠avx
Extracts 128 bits (composed of integer data) from a, selected with imm8.
_mm256_floor_pd^⚠avx
Rounds packed double-precision (64-bit) floating point elements in a toward negative infinity.
_mm256_floor_ps^⚠avx
Rounds packed single-precision (32-bit) floating point elements in a toward negative infinity.
_mm256_hadd_pd^⚠avx
Horizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
_mm256_hadd_ps^⚠avx
Horizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
_mm256_hsub_pd^⚠avx
Horizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
_mm256_hsub_ps^⚠avx
Horizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
_mm256_insert_epi8^⚠avx
Copies a to result, and inserts the 8-bit integer i into result at the location specified by index.
_mm256_insert_epi16^⚠avx
Copies a to result, and inserts the 16-bit integer i into result at the location specified by index.
_mm256_insert_epi32^⚠avx
Copies a to result, and inserts the 32-bit integer i into result at the location specified by index.
_mm256_insertf128_pd^⚠avx
Copies a to result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into result at the location specified by imm8.
_mm256_insertf128_ps^⚠avx
Copies a to result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into result at the location specified by imm8.
_mm256_insertf128_si256^⚠avx
Copies a to result, then inserts 128 bits from b into result at the location specified by imm8.
_mm256_lddqu_si256^⚠avx
Loads 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than _mm256_loadu_si256 when the data crosses a cache line boundary.
_mm256_load_pd^⚠avx
Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_load_ps^⚠avx
Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_load_si256^⚠avx
Loads 256-bits of integer data from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_loadu2_m128^⚠avx,sse
Loads two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu2_m128d^⚠avx,sse2
Loads two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu2_m128i^⚠avx,sse2
Loads two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_loadu_pd^⚠avx
Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_loadu_ps^⚠avx
Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_loadu_si256^⚠avx
Loads 256-bits of integer data from memory into result. mem_addr does not need to be aligned on any particular boundary.
_mm256_maskload_pd^⚠avx
Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskload_ps^⚠avx
Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm256_maskstore_pd^⚠avx
Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
_mm256_maskstore_ps^⚠avx
Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
_mm256_max_pd^⚠avx
Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed maximum values
_mm256_max_ps^⚠avx
Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed maximum values
_mm256_min_pd^⚠avx
Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed minimum values
_mm256_min_ps^⚠avx
Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed minimum values
_mm256_movedup_pd^⚠avx
Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and returns the results.
_mm256_movehdup_ps^⚠avx
Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
_mm256_moveldup_ps^⚠avx
Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
_mm256_movemask_pd^⚠avx
Sets each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.
_mm256_movemask_ps^⚠avx
Sets each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
_mm256_mul_pd^⚠avx
Multiplies packed double-precision (64-bit) floating-point elements in a and b.
_mm256_mul_ps^⚠avx
Multiplies packed single-precision (32-bit) floating-point elements in a and b.
_mm256_or_pd^⚠avx
Computes the bitwise OR packed double-precision (64-bit) floating-point elements in a and b.
_mm256_or_ps^⚠avx
Computes the bitwise OR packed single-precision (32-bit) floating-point elements in a and b.
_mm256_permute2f128_pd^⚠avx
Shuffles 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b.
_mm256_permute2f128_ps^⚠avx
Shuffles 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b.
_mm256_permute2f128_si256^⚠avx
Shuffles 128-bits (composed of integer data) selected by imm8 from a and b.
_mm256_permute_pd^⚠avx
Shuffles double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_permute_ps^⚠avx
Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_permutevar_pd^⚠avx
Shuffles double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in b.
_mm256_permutevar_ps^⚠avx
Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b.
_mm256_rcp_ps^⚠avx
Computes the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_round_pd^⚠avx
Rounds packed double-precision (64-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
_mm256_round_ps^⚠avx
Rounds packed single-precision (32-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
_mm256_rsqrt_ps^⚠avx
Computes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
_mm256_set1_epi8^⚠avx
Broadcasts 8-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastb.
_mm256_set1_epi16^⚠avx
Broadcasts 16-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastw.
_mm256_set1_epi32^⚠avx
Broadcasts 32-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastd.
_mm256_set1_epi64x^⚠avx
Broadcasts 64-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastq.
_mm256_set1_pd^⚠avx
Broadcasts double-precision (64-bit) floating-point value a to all elements of returned vector.
_mm256_set1_ps^⚠avx
Broadcasts single-precision (32-bit) floating-point value a to all elements of returned vector.
_mm256_set_epi8^⚠avx
Sets packed 8-bit integers in returned vector with the supplied values.
_mm256_set_epi16^⚠avx
Sets packed 16-bit integers in returned vector with the supplied values.
_mm256_set_epi32^⚠avx
Sets packed 32-bit integers in returned vector with the supplied values.
_mm256_set_epi64x^⚠avx
Sets packed 64-bit integers in returned vector with the supplied values.
_mm256_set_m128^⚠avx
Sets packed __m256 returned vector with the supplied values.
_mm256_set_m128d^⚠avx
Sets packed __m256d returned vector with the supplied values.
_mm256_set_m128i^⚠avx
Sets packed __m256i returned vector with the supplied values.
_mm256_set_pd^⚠avx
Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
_mm256_set_ps^⚠avx
Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
_mm256_setr_epi8^⚠avx
Sets packed 8-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi16^⚠avx
Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi32^⚠avx
Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_epi64x^⚠avx
Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
_mm256_setr_m128^⚠avx
Sets packed __m256 returned vector with the supplied values.
_mm256_setr_m128d^⚠avx
Sets packed __m256d returned vector with the supplied values.
_mm256_setr_m128i^⚠avx
Sets packed __m256i returned vector with the supplied values.
_mm256_setr_pd^⚠avx
Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setr_ps^⚠avx
Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
_mm256_setzero_pd^⚠avx
Returns vector of type __m256d with all elements set to zero.
_mm256_setzero_ps^⚠avx
Returns vector of type __m256 with all elements set to zero.
_mm256_setzero_si256^⚠avx
Returns vector of type __m256i with all elements set to zero.
_mm256_shuffle_pd^⚠avx
Shuffles double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8.
_mm256_shuffle_ps^⚠avx
Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
_mm256_sqrt_pd^⚠avx
Returns the square root of packed double-precision (64-bit) floating point elements in a.
_mm256_sqrt_ps^⚠avx
Returns the square root of packed single-precision (32-bit) floating point elements in a.
_mm256_store_pd^⚠avx
Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_store_ps^⚠avx
Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_store_si256^⚠avx
Stores 256-bits of integer data from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_storeu2_m128^⚠avx,sse
Stores the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu2_m128d^⚠avx,sse2
Stores the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu2_m128i^⚠avx,sse2
Stores the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
_mm256_storeu_pd^⚠avx
Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_storeu_ps^⚠avx
Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_storeu_si256^⚠avx
Stores 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
_mm256_stream_pd^⚠avx
Moves double-precision values from a 256-bit vector of [4 x double] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
_mm256_stream_ps^⚠avx
Moves single-precision floating point values from a 256-bit vector of [8 x float] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
_mm256_stream_si256^⚠avx
Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
_mm256_sub_pd^⚠avx
Subtracts packed double-precision (64-bit) floating-point elements in b from packed elements in a.
_mm256_sub_ps^⚠avx
Subtracts packed single-precision (32-bit) floating-point elements in b from packed elements in a.
_mm256_testc_pd^⚠avx
Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm256_testc_ps^⚠avx
Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm256_testc_si256^⚠avx
Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value.
_mm256_testnzc_pd^⚠avx
Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testnzc_ps^⚠avx
Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testnzc_si256^⚠avx
Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm256_testz_pd^⚠avx
Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm256_testz_ps^⚠avx
Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm256_testz_si256^⚠avx
Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the ZF value.
_mm256_undefined_pd^⚠avx
Returns vector of type __m256d with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
_mm256_undefined_ps^⚠avx
Returns vector of type __m256 with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
_mm256_undefined_si256^⚠avx
Returns vector of type __m256i with with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
_mm256_unpackhi_pd^⚠avx
Unpacks and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.
_mm256_unpackhi_ps^⚠avx
Unpacks and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b.
_mm256_unpacklo_pd^⚠avx
Unpacks and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b.
_mm256_unpacklo_ps^⚠avx
Unpacks and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b.
_mm256_xor_pd^⚠avx
Computes the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b.
_mm256_xor_ps^⚠avx
Computes the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.
_mm256_zeroall^⚠avx
Zeroes the contents of all XMM or YMM registers.
_mm256_zeroupper^⚠avx
Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
_mm256_zextpd128_pd256^⚠avx,sse2
Constructs a 256-bit floating-point vector of [4 x double] from a 128-bit floating-point vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextps128_ps256^⚠avx,sse
Constructs a 256-bit floating-point vector of [8 x float] from a 128-bit floating-point vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm256_zextsi128_si256^⚠avx,sse2
Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
_mm_broadcast_ss^⚠avx
Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
_mm_cmp_pd^⚠avx,sse2
Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm_cmp_ps^⚠avx,sse
Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
_mm_cmp_sd^⚠avx,sse2
Compares the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper element from a to the upper element of returned vector.
_mm_cmp_ss^⚠avx,sse
Compares the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper 3 packed elements from a to the upper elements of returned vector.
_mm_maskload_pd^⚠avx
Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskload_ps^⚠avx
Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
_mm_maskstore_pd^⚠avx
Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
_mm_maskstore_ps^⚠avx
Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
_mm_permute_pd^⚠avx,sse2
Shuffles double-precision (64-bit) floating-point elements in a using the control in imm8.
_mm_permute_ps^⚠avx,sse
Shuffles single-precision (32-bit) floating-point elements in a using the control in imm8.
_mm_permutevar_pd^⚠avx
Shuffles double-precision (64-bit) floating-point elements in a using the control in b.
_mm_permutevar_ps^⚠avx
Shuffles single-precision (32-bit) floating-point elements in a using the control in b.
_mm_testc_pd^⚠avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm_testc_ps^⚠avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
_mm_testnzc_pd^⚠avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm_testnzc_ps^⚠avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
_mm_testz_pd^⚠avx
Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
_mm_testz_ps^⚠avx
Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.