Module core::core_arch::x86::fma

🔬This is a nightly-only experimental API. (stdsimd #48556)

Available on x86 or x86-64 only.

Expand description

Fused Multiply-Add instruction set (FMA)

The FMA instruction set is an extension to the 128 and 256-bit SSE instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.

The references are:

Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z.
AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.

Wikipedia’s FMA page provides a quick overview of the instructions available.

Functions

vfmaddsd 🔒 ^⚠Experimental
vfmaddss 🔒 ^⚠Experimental
vfmaddsubpd 🔒 ^⚠Experimental
vfmaddsubpd256 🔒 ^⚠Experimental
vfmaddsubps 🔒 ^⚠Experimental
vfmaddsubps256 🔒 ^⚠Experimental
vfmsubaddpd 🔒 ^⚠Experimental
vfmsubaddpd256 🔒 ^⚠Experimental
vfmsubaddps 🔒 ^⚠Experimental
vfmsubaddps256 🔒 ^⚠Experimental
vfmsubpd 🔒 ^⚠Experimental
vfmsubpd256 🔒 ^⚠Experimental
vfmsubps 🔒 ^⚠Experimental
vfmsubps256 🔒 ^⚠Experimental
vfmsubsd 🔒 ^⚠Experimental
vfmsubss 🔒 ^⚠Experimental
vfnmaddpd 🔒 ^⚠Experimental
vfnmaddpd256 🔒 ^⚠Experimental
vfnmaddps 🔒 ^⚠Experimental
vfnmaddps256 🔒 ^⚠Experimental
vfnmaddsd 🔒 ^⚠Experimental
vfnmaddss 🔒 ^⚠Experimental
vfnmsubpd 🔒 ^⚠Experimental
vfnmsubpd256 🔒 ^⚠Experimental
vfnmsubps 🔒 ^⚠Experimental
vfnmsubps256 🔒 ^⚠Experimental
vfnmsubsd 🔒 ^⚠Experimental
vfnmsubss 🔒 ^⚠Experimental
_mm256_fmadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm256_fmadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm256_fmaddsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm256_fmaddsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm256_fmsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm256_fmsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm256_fmsubadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm256_fmsubadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm256_fnmadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm256_fnmadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm256_fnmsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm256_fnmsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fmadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm_fmadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to packed elements in c.
_mm_fmadd_sd^⚠fma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Stores the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fmadd_ss^⚠fma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Stores the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fmaddsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm_fmaddsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively add and subtract packed elements in c to/from the intermediate result.
_mm_fmsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm_fmsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result.
_mm_fmsub_sd^⚠fma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fmsub_ss^⚠fma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fmsubadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm_fmsubadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and alternatively subtract and add packed elements in c from/to the intermediate result.
_mm_fnmadd_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm_fnmadd_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to packed elements in c.
_mm_fnmadd_sd^⚠fma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fnmadd_ss^⚠fma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.
_mm_fnmsub_pd^⚠fma
Multiplies packed double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fnmsub_ps^⚠fma
Multiplies packed single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result.
_mm_fnmsub_sd^⚠fma
Multiplies the lower double-precision (64-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element from a to the upper elements of the result.
_mm_fnmsub_ss^⚠fma
Multiplies the lower single-precision (32-bit) floating-point elements in a and b, and subtract packed elements in c from the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements from a to the upper elements of the result.