🔬This is a nightly-only experimental API. (
stdsimd #48556)Available on x86 or x86-64 only.
Expand description
Fused Multiply-Add instruction set (FMA)
The FMA instruction set is an extension to the 128 and 256-bit SSE instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.
The references are:
- Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference, A-Z.
- AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System Instructions.
Wikipedia’s FMA page provides a quick overview of the instructions available.
Functions
- _mm256_fmadd_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc. - _mm256_fmadd_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc. - Multiplies packed double-precision (64-bit) floating-point elements in
aandb, and alternatively add and subtract packed elements incto/from the intermediate result. - Multiplies packed single-precision (32-bit) floating-point elements in
aandb, and alternatively add and subtract packed elements incto/from the intermediate result. - _mm256_fmsub_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result. - _mm256_fmsub_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result. - Multiplies packed double-precision (64-bit) floating-point elements in
aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result. - Multiplies packed single-precision (32-bit) floating-point elements in
aandb, and alternatively subtract and add packed elements incfrom/to the intermediate result. - _mm256_fnmadd_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc. - _mm256_fnmadd_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc. - _mm256_fnmsub_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. - _mm256_fnmsub_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. - _mm_fmadd_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc. - _mm_fmadd_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc. - _mm_fmadd_sd⚠
fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result. - _mm_fmadd_ss⚠
fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result. - _mm_fmaddsub_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result. - _mm_fmaddsub_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result. - _mm_fmsub_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result. - _mm_fmsub_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result. - _mm_fmsub_sd⚠
fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result. - _mm_fmsub_ss⚠
fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result. - _mm_fmsubadd_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result. - _mm_fmsubadd_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result. - _mm_fnmadd_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc. - _mm_fnmadd_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc. - _mm_fnmadd_sd⚠
fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result. - _mm_fnmadd_ss⚠
fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result. - _mm_fnmsub_pd⚠
fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. - _mm_fnmsub_ps⚠
fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. - _mm_fnmsub_sd⚠
fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result. - _mm_fnmsub_ss⚠
fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.