Module core::core_arch::x86::sse

source ·
🔬This is a nightly-only experimental API. (stdsimd #48556)
Available on x86 or x86-64 only.
Expand description

Streaming SIMD Extensions (SSE)

Constants

Functions

  • _MM_SHUFFLEExperimental
    A utility function for creating masks to use with Intel shuffle and permute intrinsics.
  • addss 🔒 Experimental
  • cmpps 🔒 Experimental
  • cmpss 🔒 Experimental
  • comieq_ss 🔒 Experimental
  • comige_ss 🔒 Experimental
  • comigt_ss 🔒 Experimental
  • comile_ss 🔒 Experimental
  • comilt_ss 🔒 Experimental
  • comineq_ss 🔒 Experimental
  • cvtsi2ss 🔒 Experimental
  • cvtss2si 🔒 Experimental
  • cvttss2si 🔒 Experimental
  • divss 🔒 Experimental
  • ldmxcsr 🔒 Experimental
  • maxps 🔒 Experimental
  • maxss 🔒 Experimental
  • minps 🔒 Experimental
  • minss 🔒 Experimental
  • movmskps 🔒 Experimental
  • mulss 🔒 Experimental
  • prefetch 🔒 Experimental
  • rcpps 🔒 Experimental
  • rcpss 🔒 Experimental
  • rsqrtps 🔒 Experimental
  • rsqrtss 🔒 Experimental
  • sfence 🔒 Experimental
  • sqrtps 🔒 Experimental
  • sqrtss 🔒 Experimental
  • stmxcsr 🔒 Experimental
  • subss 🔒 Experimental
  • ucomieq_ss 🔒 Experimental
  • ucomige_ss 🔒 Experimental
  • ucomigt_ss 🔒 Experimental
  • ucomile_ss 🔒 Experimental
  • ucomilt_ss 🔒 Experimental
  • ucomineq_ss 🔒 Experimental
  • Transpose the 4x4 matrix formed by 4 rows of __m128 in place.
  • Adds __m128 vectors.
  • Adds the first component of a and b, the other components are copied from a.
  • Bitwise AND of packed single-precision (32-bit) floating-point elements.
  • Bitwise AND-NOT of packed single-precision (32-bit) floating-point elements.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input elements were equal, or 0 otherwise.
  • Compares the lowest f32 of both inputs for equality. The lowest 32 bits of the result will be 0xffffffff if the two inputs are equal, or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is greater than or equal to the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for greater than or equal. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is greater than or equal b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is greater than the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for greater than. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is greater than b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is less than or equal to the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for less than or equal. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is less than or equal b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is less than the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for less than. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is less than b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input elements are not equal, or 0 otherwise.
  • Compares the lowest f32 of both inputs for inequality. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is not equal to b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is not greater than or equal to the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for not-greater-than-or-equal. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is not greater than or equal to b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is not greater than the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for not-greater-than. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is not greater than b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is not less than or equal to the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for not-less-than-or-equal. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is not less than or equal to b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. The result in the output vector will be 0xffffffff if the input element in a is not less than the corresponding element in b, or 0 otherwise.
  • Compares the lowest f32 of both inputs for not-less-than. The lowest 32 bits of the result will be 0xffffffff if a.extract(0) is not less than b.extract(0), or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. Returns four floats that have one of two possible bit patterns. The element in the output vector will be 0xffffffff if the input elements in a and b are ordered (i.e., neither of them is a NaN), or 0 otherwise.
  • Checks if the lowest f32 of both inputs are ordered. The lowest 32 bits of the result will be 0xffffffff if neither of a.extract(0) or b.extract(0) is a NaN, or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares each of the four floats in a to the corresponding element in b. Returns four floats that have one of two possible bit patterns. The element in the output vector will be 0xffffffff if the input elements in a and b are unordered (i.e., at least on of them is a NaN), or 0 otherwise.
  • Checks if the lowest f32 of both inputs are unordered. The lowest 32 bits of the result will be 0xffffffff if any of a.extract(0) or b.extract(0) is a NaN, or 0 otherwise. The upper 96 bits of the result are the upper 96 bits of a.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if they are equal, or 0 otherwise.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is greater than or equal to the one from b, or 0 otherwise.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is greater than the one from b, or 0 otherwise.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is less than or equal to the one from b, or 0 otherwise.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is less than the one from b, or 0 otherwise.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if they are not equal, or 0 otherwise.
  • Alias for _mm_cvtsi32_ss.
  • Alias for _mm_cvtss_si32.
  • Converts a 32 bit integer to a 32 bit float. The result vector is the input vector a with the lowest 32 bit float replaced by the converted integer.
  • Extracts the lowest 32 bit float from the input vector.
  • Converts the lowest 32 bit float in the input vector to a 32 bit integer.
  • Alias for _mm_cvttss_si32.
  • Converts the lowest 32 bit float in the input vector to a 32 bit integer with truncation.
  • Divides __m128 vectors.
  • Divides the first component of b by a, the other components are copied from a.
  • Gets the unsigned 32-bit value of the MXCSR control and status register.
  • Construct a __m128 by duplicating the value read from p into all elements.
  • Loads four f32 values from aligned memory into a __m128. If the pointer is not aligned to a 128-bit boundary (16 bytes) a general protection fault will be triggered (fatal program crash).
  • Alias for _mm_load1_ps
  • Construct a __m128 with the lowest element read from p and the other elements set to zero.
  • Loads four f32 values from aligned memory into a __m128 in reverse order.
  • Loads four f32 values from memory into a __m128. There are no restrictions on memory alignment. For aligned memory _mm_load_ps may be faster.
  • Loads unaligned 64-bits of integer data from memory into new vector.
  • Compares packed single-precision (32-bit) floating-point elements in a and b, and return the corresponding maximum values.
  • Compares the first single-precision (32-bit) floating-point element of a and b, and return the maximum value in the first element of the return value, the other elements are copied from a.
  • Compares packed single-precision (32-bit) floating-point elements in a and b, and return the corresponding minimum values.
  • Compares the first single-precision (32-bit) floating-point element of a and b, and return the minimum value in the first element of the return value, the other elements are copied from a.
  • Returns a __m128 with the first component from b and the remaining components from a.
  • Combine higher half of a and b. The higher half of b occupies the lower half of result.
  • Combine lower half of a and b. The lower half of b occupies the higher half of result.
  • Returns a mask of the most significant bit of each element in a.
  • Multiplies __m128 vectors.
  • Multiplies the first component of a and b, the other components are copied from a.
  • _mm_or_pssse
    Bitwise OR of packed single-precision (32-bit) floating-point elements.
  • Fetch the cache line that contains address p using the given STRATEGY.
  • Returns the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a.
  • Returns the approximate reciprocal of the first single-precision (32-bit) floating-point element in a, the other elements are unchanged.
  • Returns the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a.
  • Returns the approximate reciprocal square root of the first single-precision (32-bit) floating-point element in a, the other elements are unchanged.
  • Construct a __m128 with all element set to a.
  • Construct a __m128 from four floating point values highest to lowest.
  • Alias for _mm_set1_ps
  • Construct a __m128 with the lowest element set to a and the rest set to zero.
  • Sets the MXCSR register with the 32-bit unsigned integer value.
  • Construct a __m128 from four floating point values lowest to highest.
  • Construct a __m128 with all elements initialized to zero.
  • Performs a serializing operation on all store-to-memory instructions that were issued prior to this instruction.
  • Shuffles packed single-precision (32-bit) floating-point elements in a and b using MASK.
  • Returns the square root of packed single-precision (32-bit) floating-point elements in a.
  • Returns the square root of the first single-precision (32-bit) floating-point element in a, the other elements are unchanged.
  • Stores the lowest 32 bit float of a repeated four times into aligned memory.
  • Stores four 32-bit floats into aligned memory.
  • Alias for _mm_store1_ps
  • Stores the lowest 32 bit float of a into memory.
  • Stores four 32-bit floats into aligned memory in reverse order.
  • Stores four 32-bit floats into memory. There are no restrictions on memory alignment. For aligned memory _mm_store_ps may be faster.
  • Stores a into the memory at mem_addr using a non-temporal memory hint.
  • Subtracts __m128 vectors.
  • Subtracts the first component of b from a, the other components are copied from a.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if they are equal, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is greater than or equal to the one from b, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is greater than the one from b, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is less than or equal to the one from b, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if the value from a is less than the one from b, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Compares two 32-bit floats from the low-order bits of a and b. Returns 1 if they are not equal, or 0 otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
  • Returns vector of type __m128 with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent to mem::MaybeUninit. In practice, this is equivalent to mem::zeroed.
  • Unpacks and interleave single-precision (32-bit) floating-point elements from the higher half of a and b.
  • Unpacks and interleave single-precision (32-bit) floating-point elements from the lower half of a and b.
  • Bitwise exclusive OR of packed single-precision (32-bit) floating-point elements.