Module core::core_arch::x86::avx2

source ·
🔬This is a nightly-only experimental API. (stdsimd #48556)
Available on x86 or x86-64 only.
Expand description

Advanced Vector Extensions 2 (AVX)

AVX2 expands most AVX commands to 256-bit wide vector registers and adds FMA.

The references are:

Wikipedia’s AVX and FMA pages provide a quick overview of the instructions available.

Functions

  • maskloadd 🔒 Experimental
  • maskloadd256 🔒 Experimental
  • maskloadq 🔒 Experimental
  • maskloadq256 🔒 Experimental
  • maskstored 🔒 Experimental
  • maskstored256 🔒 Experimental
  • maskstoreq 🔒 Experimental
  • maskstoreq256 🔒 Experimental
  • mpsadbw 🔒 Experimental
  • pabsb 🔒 Experimental
  • pabsd 🔒 Experimental
  • pabsw 🔒 Experimental
  • packssdw 🔒 Experimental
  • packsswb 🔒 Experimental
  • packusdw 🔒 Experimental
  • packuswb 🔒 Experimental
  • pavgb 🔒 Experimental
  • pavgw 🔒 Experimental
  • pblendvb 🔒 Experimental
  • permd 🔒 Experimental
  • permps 🔒 Experimental
  • pgatherdd 🔒 Experimental
  • pgatherdpd 🔒 Experimental
  • pgatherdps 🔒 Experimental
  • pgatherdq 🔒 Experimental
  • pgatherqd 🔒 Experimental
  • pgatherqpd 🔒 Experimental
  • pgatherqps 🔒 Experimental
  • pgatherqq 🔒 Experimental
  • phaddd 🔒 Experimental
  • phaddsw 🔒 Experimental
  • phaddw 🔒 Experimental
  • phsubd 🔒 Experimental
  • phsubsw 🔒 Experimental
  • phsubw 🔒 Experimental
  • pmaddubsw 🔒 Experimental
  • pmaddwd 🔒 Experimental
  • pmuldq 🔒 Experimental
  • pmulhrsw 🔒 Experimental
  • pmulhuw 🔒 Experimental
  • pmulhw 🔒 Experimental
  • pmuludq 🔒 Experimental
  • psadbw 🔒 Experimental
  • pshufb 🔒 Experimental
  • psignb 🔒 Experimental
  • psignd 🔒 Experimental
  • psignw 🔒 Experimental
  • pslld 🔒 Experimental
  • psllid 🔒 Experimental
  • pslliq 🔒 Experimental
  • pslliw 🔒 Experimental
  • psllq 🔒 Experimental
  • psllvd 🔒 Experimental
  • psllvd256 🔒 Experimental
  • psllvq 🔒 Experimental
  • psllvq256 🔒 Experimental
  • psllw 🔒 Experimental
  • psrad 🔒 Experimental
  • psraid 🔒 Experimental
  • psraiw 🔒 Experimental
  • psravd 🔒 Experimental
  • psravd256 🔒 Experimental
  • psraw 🔒 Experimental
  • psrld 🔒 Experimental
  • psrlid 🔒 Experimental
  • psrliq 🔒 Experimental
  • psrliw 🔒 Experimental
  • psrlq 🔒 Experimental
  • psrlvd 🔒 Experimental
  • psrlvd256 🔒 Experimental
  • psrlvq 🔒 Experimental
  • psrlvq256 🔒 Experimental
  • psrlw 🔒 Experimental
  • vperm2i128 🔒 Experimental
  • vpgatherdd 🔒 Experimental
  • vpgatherdpd 🔒 Experimental
  • vpgatherdps 🔒 Experimental
  • vpgatherdq 🔒 Experimental
  • vpgatherqd 🔒 Experimental
  • vpgatherqpd 🔒 Experimental
  • vpgatherqps 🔒 Experimental
  • vpgatherqq 🔒 Experimental
  • vpslldq 🔒 Experimental
  • vpsrldq 🔒 Experimental
  • Computes the absolute values of packed 8-bit integers in a.
  • Computes the absolute values of packed 16-bit integers in a.
  • Computes the absolute values of packed 32-bit integers in a.
  • Adds packed 8-bit integers in a and b.
  • Adds packed 16-bit integers in a and b.
  • Adds packed 32-bit integers in a and b.
  • Adds packed 64-bit integers in a and b.
  • Adds packed 8-bit integers in a and b using saturation.
  • Adds packed 16-bit integers in a and b using saturation.
  • Adds packed unsigned 8-bit integers in a and b using saturation.
  • Adds packed unsigned 16-bit integers in a and b using saturation.
  • Concatenates pairs of 16-byte blocks in a and b into a 32-byte temporary result, shifts the result right by n bytes, and returns the low 16 bytes.
  • Computes the bitwise AND of 256 bits (representing integer data) in a and b.
  • Computes the bitwise NOT of 256 bits (representing integer data) in a and then AND with b.
  • Averages packed unsigned 8-bit integers in a and b.
  • Averages packed unsigned 16-bit integers in a and b.
  • Blends packed 16-bit integers from a and b using control mask IMM8.
  • Blends packed 32-bit integers from a and b using control mask IMM8.
  • Blends packed 8-bit integers from a and b using mask.
  • Broadcasts the low packed 8-bit integer from a to all elements of the 256-bit returned value.
  • Broadcasts the low packed 32-bit integer from a to all elements of the 256-bit returned value.
  • Broadcasts the low packed 64-bit integer from a to all elements of the 256-bit returned value.
  • Broadcasts the low double-precision (64-bit) floating-point element from a to all elements of the 256-bit returned value.
  • Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
  • Broadcasts the low single-precision (32-bit) floating-point element from a to all elements of the 256-bit returned value.
  • Broadcasts the low packed 16-bit integer from a to all elements of the 256-bit returned value
  • Shifts 128-bit lanes in a left by imm8 bytes while shifting in zeros.
  • Shifts 128-bit lanes in a right by imm8 bytes while shifting in zeros.
  • Compares packed 8-bit integers in a and b for equality.
  • Compares packed 16-bit integers in a and b for equality.
  • Compares packed 32-bit integers in a and b for equality.
  • Compares packed 64-bit integers in a and b for equality.
  • Compares packed 8-bit integers in a and b for greater-than.
  • Compares packed 16-bit integers in a and b for greater-than.
  • Compares packed 32-bit integers in a and b for greater-than.
  • Compares packed 64-bit integers in a and b for greater-than.
  • Sign-extend 8-bit integers to 16-bit integers.
  • Sign-extend 8-bit integers to 32-bit integers.
  • Sign-extend 8-bit integers to 64-bit integers.
  • Sign-extend 16-bit integers to 32-bit integers.
  • Sign-extend 16-bit integers to 64-bit integers.
  • Sign-extend 32-bit integers to 64-bit integers.
  • Zero-extend unsigned 8-bit integers in a to 16-bit integers.
  • Zero-extend the lower eight unsigned 8-bit integers in a to 32-bit integers. The upper eight elements of a are unused.
  • Zero-extend the lower four unsigned 8-bit integers in a to 64-bit integers. The upper twelve elements of a are unused.
  • Zeroes extend packed unsigned 16-bit integers in a to packed 32-bit integers, and stores the results in dst.
  • Zero-extend the lower four unsigned 16-bit integers in a to 64-bit integers. The upper four elements of a are unused.
  • Zero-extend unsigned 32-bit integers in a to 64-bit integers.
  • Returns the first element of the input vector of [4 x double].
  • Returns the first element of the input vector of [8 x i32].
  • Extracts an 8-bit integer from a, selected with INDEX. Returns a 32-bit integer containing the zero-extended integer data.
  • Extracts a 16-bit integer from a, selected with INDEX. Returns a 32-bit integer containing the zero-extended integer data.
  • Extracts a 32-bit integer from a, selected with INDEX.
  • Extracts 128 bits (of integer data) from a selected with IMM1.
  • Horizontally adds adjacent pairs of 16-bit integers in a and b.
  • Horizontally adds adjacent pairs of 32-bit integers in a and b.
  • Horizontally adds adjacent pairs of 16-bit integers in a and b using saturation.
  • Horizontally subtract adjacent pairs of 16-bit integers in a and b.
  • Horizontally subtract adjacent pairs of 32-bit integers in a and b.
  • Horizontally subtract adjacent pairs of 16-bit integers in a and b using saturation.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Copies a to dst, then insert 128 bits (of integer data) from b at the location specified by IMM1.
  • Multiplies packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers.
  • Vertically multiplies each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Loads packed 32-bit integers from memory pointed by mem_addr using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
  • Loads packed 64-bit integers from memory pointed by mem_addr using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
  • Stores packed 32-bit integers from a into memory pointed by mem_addr using mask (elements are not stored when the highest bit is not set in the corresponding element).
  • Stores packed 64-bit integers from a into memory pointed by mem_addr using mask (elements are not stored when the highest bit is not set in the corresponding element).
  • Compares packed 8-bit integers in a and b, and returns the packed maximum values.
  • Compares packed 16-bit integers in a and b, and returns the packed maximum values.
  • Compares packed 32-bit integers in a and b, and returns the packed maximum values.
  • Compares packed unsigned 8-bit integers in a and b, and returns the packed maximum values.
  • Compares packed unsigned 16-bit integers in a and b, and returns the packed maximum values.
  • Compares packed unsigned 32-bit integers in a and b, and returns the packed maximum values.
  • Compares packed 8-bit integers in a and b, and returns the packed minimum values.
  • Compares packed 16-bit integers in a and b, and returns the packed minimum values.
  • Compares packed 32-bit integers in a and b, and returns the packed minimum values.
  • Compares packed unsigned 8-bit integers in a and b, and returns the packed minimum values.
  • Compares packed unsigned 16-bit integers in a and b, and returns the packed minimum values.
  • Compares packed unsigned 32-bit integers in a and b, and returns the packed minimum values.
  • Creates mask from the most significant bit of each 8-bit element in a, return the result.
  • Computes the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and stores the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet from b and eight quadruplets from a. One quadruplet is selected from b starting at on the offset specified in imm8. Eight quadruplets are formed from sequential 8-bit integers selected from a starting at the offset specified in imm8.
  • Multiplies the low 32-bit integers from each packed 64-bit element in a and b
  • Multiplies the low unsigned 32-bit integers from each packed 64-bit element in a and b
  • Multiplies the packed 16-bit integers in a and b, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
  • Multiplies the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
  • Multiplies packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and return bits [16:1].
  • Multiplies the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and returns the low 16 bits of the intermediate integers
  • Multiplies the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and returns the low 32 bits of the intermediate integers
  • Computes the bitwise OR of 256 bits (representing integer data) in a and b
  • Converts packed 16-bit integers from a and b to packed 8-bit integers using signed saturation
  • Converts packed 32-bit integers from a and b to packed 16-bit integers using signed saturation
  • Converts packed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation
  • Converts packed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation
  • Shuffles 128-bits of integer data selected by imm8 from a and b.
  • Permutes 64-bit integers from a using control mask imm8.
  • Shuffles 64-bit floating-point elements in a across lanes using the control in imm8.
  • Permutes packed 32-bit integers from a according to the content of b.
  • Shuffles eight 32-bit floating-point elements in a across lanes using the corresponding 32-bit integer index in idx.
  • Computes the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of the 64-bit return value
  • Shuffles bytes from a according to the content of b.
  • Shuffles 32-bit integers in 128-bit lanes of a using the control in imm8.
  • Shuffles 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. The low 64 bits of 128-bit lanes of a are copied to the output.
  • Shuffles 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. The high 64 bits of 128-bit lanes of a are copied to the output.
  • Negates packed 8-bit integers in a when the corresponding signed 8-bit integer in b is negative, and returns the results. Results are zeroed out when the corresponding element in b is zero.
  • Negates packed 16-bit integers in a when the corresponding signed 16-bit integer in b is negative, and returns the results. Results are zeroed out when the corresponding element in b is zero.
  • Negates packed 32-bit integers in a when the corresponding signed 32-bit integer in b is negative, and returns the results. Results are zeroed out when the corresponding element in b is zero.
  • Shifts packed 16-bit integers in a left by count while shifting in zeros, and returns the result
  • Shifts packed 32-bit integers in a left by count while shifting in zeros, and returns the result
  • Shifts packed 64-bit integers in a left by count while shifting in zeros, and returns the result
  • Shifts packed 16-bit integers in a left by IMM8 while shifting in zeros, return the results;
  • Shifts packed 32-bit integers in a left by IMM8 while shifting in zeros, return the results;
  • Shifts packed 64-bit integers in a left by IMM8 while shifting in zeros, return the results;
  • Shifts 128-bit lanes in a left by imm8 bytes while shifting in zeros.
  • Shifts packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and returns the result.
  • Shifts packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and returns the result.
  • Shifts packed 16-bit integers in a right by count while shifting in sign bits.
  • Shifts packed 32-bit integers in a right by count while shifting in sign bits.
  • Shifts packed 16-bit integers in a right by IMM8 while shifting in sign bits.
  • Shifts packed 32-bit integers in a right by IMM8 while shifting in sign bits.
  • Shifts packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits.
  • Shifts packed 16-bit integers in a right by count while shifting in zeros.
  • Shifts packed 32-bit integers in a right by count while shifting in zeros.
  • Shifts packed 64-bit integers in a right by count while shifting in zeros.
  • Shifts packed 16-bit integers in a right by IMM8 while shifting in zeros
  • Shifts packed 32-bit integers in a right by IMM8 while shifting in zeros
  • Shifts packed 64-bit integers in a right by IMM8 while shifting in zeros
  • Shifts 128-bit lanes in a right by imm8 bytes while shifting in zeros.
  • Shifts packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros,
  • Shifts packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros,
  • Subtract packed 8-bit integers in b from packed 8-bit integers in a
  • Subtract packed 16-bit integers in b from packed 16-bit integers in a
  • Subtract packed 32-bit integers in b from packed 32-bit integers in a
  • Subtract packed 64-bit integers in b from packed 64-bit integers in a
  • Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation.
  • Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation.
  • Subtract packed unsigned 8-bit integers in b from packed 8-bit integers in a using saturation.
  • Subtract packed unsigned 16-bit integers in b from packed 16-bit integers in a using saturation.
  • Unpacks and interleave 8-bit integers from the high half of each 128-bit lane in a and b.
  • Unpacks and interleave 16-bit integers from the high half of each 128-bit lane of a and b.
  • Unpacks and interleave 32-bit integers from the high half of each 128-bit lane of a and b.
  • Unpacks and interleave 64-bit integers from the high half of each 128-bit lane of a and b.
  • Unpacks and interleave 8-bit integers from the low half of each 128-bit lane of a and b.
  • Unpacks and interleave 16-bit integers from the low half of each 128-bit lane of a and b.
  • Unpacks and interleave 32-bit integers from the low half of each 128-bit lane of a and b.
  • Unpacks and interleave 64-bit integers from the low half of each 128-bit lane of a and b.
  • Computes the bitwise XOR of 256 bits (representing integer data) in a and b
  • Blends packed 32-bit integers from a and b using control mask IMM4.
  • Broadcasts the low packed 8-bit integer from a to all elements of the 128-bit returned value.
  • Broadcasts the low packed 32-bit integer from a to all elements of the 128-bit returned value.
  • Broadcasts the low packed 64-bit integer from a to all elements of the 128-bit returned value.
  • Broadcasts the low double-precision (64-bit) floating-point element from a to all elements of the 128-bit returned value.
  • Broadcasts the low single-precision (32-bit) floating-point element from a to all elements of the 128-bit returned value.
  • Broadcasts the low packed 16-bit integer from a to all elements of the 128-bit returned value
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Returns values from slice at offsets determined by offsets * scale, where scale should be 1, 2, 4 or 8. If mask is set, load the value from src in that position instead.
  • Loads packed 32-bit integers from memory pointed by mem_addr using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
  • Loads packed 64-bit integers from memory pointed by mem_addr using mask (elements are zeroed out when the highest bit is not set in the corresponding element).
  • Stores packed 32-bit integers from a into memory pointed by mem_addr using mask (elements are not stored when the highest bit is not set in the corresponding element).
  • Stores packed 64-bit integers from a into memory pointed by mem_addr using mask (elements are not stored when the highest bit is not set in the corresponding element).
  • Shifts packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and returns the result.
  • Shifts packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and returns the result.
  • Shifts packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits.
  • Shifts packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros,
  • Shifts packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros,