Module core::core_arch::x86::avx

source ·
🔬This is a nightly-only experimental API. (stdsimd #48556)
Available on x86 or x86-64 only.
Expand description

Constants

Functions

  • addsubpd256 🔒 Experimental
  • addsubps256 🔒 Experimental
  • maskloadpd 🔒 Experimental
  • maskloadpd256 🔒 Experimental
  • maskloadps 🔒 Experimental
  • maskloadps256 🔒 Experimental
  • maskstorepd 🔒 Experimental
  • maskstorepd256 🔒 Experimental
  • maskstoreps 🔒 Experimental
  • maskstoreps256 🔒 Experimental
  • movmskpd256 🔒 Experimental
  • movmskps256 🔒 Experimental
  • ptestc256 🔒 Experimental
  • ptestnzc256 🔒 Experimental
  • ptestz256 🔒 Experimental
  • roundpd256 🔒 Experimental
  • roundps256 🔒 Experimental
  • sqrtps256 🔒 Experimental
  • storeudq256 🔒 Experimental
  • storeupd256 🔒 Experimental
  • storeups256 🔒 Experimental
  • vblendvpd 🔒 Experimental
  • vblendvps 🔒 Experimental
  • vbroadcastf128pd256 🔒 Experimental
  • vbroadcastf128ps256 🔒 Experimental
  • vcmppd 🔒 Experimental
  • vcmppd256 🔒 Experimental
  • vcmpps 🔒 Experimental
  • vcmpps256 🔒 Experimental
  • vcmpsd 🔒 Experimental
  • vcmpss 🔒 Experimental
  • vcvtdq2ps 🔒 Experimental
  • vcvtpd2dq 🔒 Experimental
  • vcvtpd2ps 🔒 Experimental
  • vcvtps2dq 🔒 Experimental
  • vcvttpd2dq 🔒 Experimental
  • vcvttps2dq 🔒 Experimental
  • vdpps 🔒 Experimental
  • vhaddpd 🔒 Experimental
  • vhaddps 🔒 Experimental
  • vhsubpd 🔒 Experimental
  • vhsubps 🔒 Experimental
  • vlddqu 🔒 Experimental
  • vmaxpd 🔒 Experimental
  • vmaxps 🔒 Experimental
  • vminpd 🔒 Experimental
  • vminps 🔒 Experimental
  • vperm2f128pd256 🔒 Experimental
  • vperm2f128ps256 🔒 Experimental
  • vperm2f128si256 🔒 Experimental
  • vpermilpd 🔒 Experimental
  • vpermilpd256 🔒 Experimental
  • vpermilps 🔒 Experimental
  • vpermilps256 🔒 Experimental
  • vrcpps 🔒 Experimental
  • vrsqrtps 🔒 Experimental
  • vtestcpd 🔒 Experimental
  • vtestcpd256 🔒 Experimental
  • vtestcps 🔒 Experimental
  • vtestcps256 🔒 Experimental
  • vtestnzcpd 🔒 Experimental
  • vtestnzcpd256 🔒 Experimental
  • vtestnzcps 🔒 Experimental
  • vtestnzcps256 🔒 Experimental
  • vtestzpd 🔒 Experimental
  • vtestzpd256 🔒 Experimental
  • vtestzps 🔒 Experimental
  • vtestzps256 🔒 Experimental
  • vzeroall 🔒 Experimental
  • vzeroupper 🔒 Experimental
  • Adds packed double-precision (64-bit) floating-point elements in a and b.
  • Adds packed single-precision (32-bit) floating-point elements in a and b.
  • Alternatively adds and subtracts packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.
  • Alternatively adds and subtracts packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.
  • Computes the bitwise AND of a packed double-precision (64-bit) floating-point elements in a and b.
  • Computes the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.
  • Computes the bitwise NOT of packed double-precision (64-bit) floating-point elements in a, and then AND with b.
  • Computes the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.
  • Blends packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.
  • Blends packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.
  • Blends packed double-precision (64-bit) floating-point elements from a and b using c as a mask.
  • Blends packed single-precision (32-bit) floating-point elements from a and b using c as a mask.
  • Broadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
  • Broadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
  • Broadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
  • Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
  • Casts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
  • Casts vector of type __m256d to type __m128d.
  • Cast vector of type __m256d to type __m256.
  • Casts vector of type __m256d to type __m256i.
  • Casts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
  • Casts vector of type __m256 to type __m128.
  • Cast vector of type __m256 to type __m256d.
  • Casts vector of type __m256 to type __m256i.
  • Casts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
  • Casts vector of type __m256i to type __m256d.
  • Casts vector of type __m256i to type __m256.
  • Casts vector of type __m256i to type __m128i.
  • Rounds packed double-precision (64-bit) floating point elements in a toward positive infinity.
  • Rounds packed single-precision (32-bit) floating point elements in a toward positive infinity.
  • Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
  • Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
  • Converts packed 32-bit integers in a to packed double-precision (64-bit) floating-point elements.
  • Converts packed 32-bit integers in a to packed single-precision (32-bit) floating-point elements.
  • Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers.
  • Converts packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements.
  • Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.
  • Converts packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements.
  • Returns the first element of the input vector of [8 x float].
  • Converts packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation.
  • Converts packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation.
  • Computes the division of each of the 4 packed 64-bit floating-point elements in a by the corresponding packed elements in b.
  • Computes the division of each of the 8 packed 32-bit floating-point elements in a by the corresponding packed elements in b.
  • Conditionally multiplies the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally return the sum using the low 4 bits of imm8.
  • Extracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm8.
  • Extracts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8.
  • Extracts 128 bits (composed of integer data) from a, selected with imm8.
  • Rounds packed double-precision (64-bit) floating point elements in a toward negative infinity.
  • Rounds packed single-precision (32-bit) floating point elements in a toward negative infinity.
  • Horizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
  • Horizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
  • Horizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating points a and b. In the result, sums of elements from a are returned in even locations, while sums of elements from b are returned in odd locations.
  • Horizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating points a and b. In the result, sums of elements from a are returned in locations of indices 0, 1, 4, 5; while sums of elements from b are locations 2, 3, 6, 7.
  • Copies a to result, and inserts the 8-bit integer i into result at the location specified by index.
  • Copies a to result, and inserts the 16-bit integer i into result at the location specified by index.
  • Copies a to result, and inserts the 32-bit integer i into result at the location specified by index.
  • Copies a to result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into result at the location specified by imm8.
  • Copies a to result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into result at the location specified by imm8.
  • Copies a to result, then inserts 128 bits from b into result at the location specified by imm8.
  • Loads 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than _mm256_loadu_si256 when the data crosses a cache line boundary.
  • Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Loads 256-bits of integer data from memory into result. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Loads two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Loads two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Loads two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Loads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
  • Loads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result. mem_addr does not need to be aligned on any particular boundary.
  • Loads 256-bits of integer data from memory into result. mem_addr does not need to be aligned on any particular boundary.
  • Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
  • Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
  • Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
  • Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
  • Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed maximum values
  • Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed maximum values
  • Compares packed double-precision (64-bit) floating-point elements in a and b, and returns packed minimum values
  • Compares packed single-precision (32-bit) floating-point elements in a and b, and returns packed minimum values
  • Duplicate even-indexed double-precision (64-bit) floating-point elements from a, and returns the results.
  • Duplicate odd-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
  • Duplicate even-indexed single-precision (32-bit) floating-point elements from a, and returns the results.
  • Sets each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.
  • Sets each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
  • Multiplies packed double-precision (64-bit) floating-point elements in a and b.
  • Multiplies packed single-precision (32-bit) floating-point elements in a and b.
  • Computes the bitwise OR packed double-precision (64-bit) floating-point elements in a and b.
  • Computes the bitwise OR packed single-precision (32-bit) floating-point elements in a and b.
  • Shuffles 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b.
  • Shuffles 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected by imm8 from a and b.
  • Shuffles 128-bits (composed of integer data) selected by imm8 from a and b.
  • Shuffles double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
  • Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
  • Shuffles double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in b.
  • Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b.
  • Computes the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
  • Rounds packed double-precision (64-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
  • Rounds packed single-precision (32-bit) floating point elements in a according to the flag ROUNDING. The value of ROUNDING may be as follows:
  • Computes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
  • Broadcasts 8-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastb.
  • Broadcasts 16-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastw.
  • Broadcasts 32-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastd.
  • Broadcasts 64-bit integer a to all elements of returned vector. This intrinsic may generate the vpbroadcastq.
  • Broadcasts double-precision (64-bit) floating-point value a to all elements of returned vector.
  • Broadcasts single-precision (32-bit) floating-point value a to all elements of returned vector.
  • Sets packed 8-bit integers in returned vector with the supplied values.
  • Sets packed 16-bit integers in returned vector with the supplied values.
  • Sets packed 32-bit integers in returned vector with the supplied values.
  • Sets packed 64-bit integers in returned vector with the supplied values.
  • Sets packed __m256 returned vector with the supplied values.
  • Sets packed __m256d returned vector with the supplied values.
  • Sets packed __m256i returned vector with the supplied values.
  • Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
  • Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
  • Sets packed 8-bit integers in returned vector with the supplied values in reverse order.
  • Sets packed 16-bit integers in returned vector with the supplied values in reverse order.
  • Sets packed 32-bit integers in returned vector with the supplied values in reverse order.
  • Sets packed 64-bit integers in returned vector with the supplied values in reverse order.
  • Sets packed __m256 returned vector with the supplied values.
  • Sets packed __m256d returned vector with the supplied values.
  • Sets packed __m256i returned vector with the supplied values.
  • Sets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
  • Sets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
  • Returns vector of type __m256d with all elements set to zero.
  • Returns vector of type __m256 with all elements set to zero.
  • Returns vector of type __m256i with all elements set to zero.
  • Shuffles double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8.
  • Shuffles single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.
  • Returns the square root of packed double-precision (64-bit) floating point elements in a.
  • Returns the square root of packed single-precision (32-bit) floating point elements in a.
  • Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Stores 256-bits of integer data from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
  • Stores the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Stores the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Stores the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.
  • Stores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
  • Stores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
  • Stores 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
  • Moves double-precision values from a 256-bit vector of [4 x double] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
  • Moves single-precision floating point values from a 256-bit vector of [8 x float] to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
  • Moves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
  • Subtracts packed double-precision (64-bit) floating-point elements in b from packed elements in a.
  • Subtracts packed single-precision (32-bit) floating-point elements in b from packed elements in a.
  • Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
  • Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
  • Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value.
  • Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
  • Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
  • Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
  • Computes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
  • Computes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
  • Computes the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Computes the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the ZF value.
  • Returns vector of type __m256d with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
  • Returns vector of type __m256 with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
  • Returns vector of type __m256i with with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
  • Unpacks and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.
  • Unpacks and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b.
  • Unpacks and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b.
  • Unpacks and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b.
  • Computes the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b.
  • Computes the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.
  • Zeroes the contents of all XMM or YMM registers.
  • Zeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
  • Constructs a 256-bit floating-point vector of [4 x double] from a 128-bit floating-point vector of [2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
  • Constructs a 256-bit floating-point vector of [8 x float] from a 128-bit floating-point vector of [4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
  • Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
  • Broadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
  • _mm_cmp_pdavx,sse2
    Compares packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
  • _mm_cmp_psavx,sse
    Compares packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by IMM5.
  • _mm_cmp_sdavx,sse2
    Compares the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper element from a to the upper element of returned vector.
  • _mm_cmp_ssavx,sse
    Compares the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by IMM5, store the result in the lower element of returned vector, and copies the upper 3 packed elements from a to the upper elements of returned vector.
  • Loads packed double-precision (64-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
  • Loads packed single-precision (32-bit) floating-point elements from memory into result using mask (elements are zeroed out when the high bit of the corresponding element is not set).
  • Stores packed double-precision (64-bit) floating-point elements from a into memory using mask.
  • Stores packed single-precision (32-bit) floating-point elements from a into memory using mask.
  • _mm_permute_pdavx,sse2
    Shuffles double-precision (64-bit) floating-point elements in a using the control in imm8.
  • _mm_permute_psavx,sse
    Shuffles single-precision (32-bit) floating-point elements in a using the control in imm8.
  • Shuffles double-precision (64-bit) floating-point elements in a using the control in b.
  • Shuffles single-precision (32-bit) floating-point elements in a using the control in b.
  • Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
  • Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the CF value.
  • Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
  • Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
  • Computes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.
  • Computes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return the ZF value.