Module core::core_arch::x86::avx512bf16

source ·

🔬This is a nightly-only experimental API. (stdsimd #48556)

Available on x86 or x86-64 only.

Expand description

AVX512BF16 intrinsics.

Functions

_mm256_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
_mm256_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
_mm256_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm256_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_mask_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_cvtneps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm256_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a
512-bit wide vector. Intel’s documentation
_mm512_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
_mm512_dpbf16_ps^⚠Experimentalavx512bf16,avx512f
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm512_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_mask_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512f
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_cvtneps_pbh^⚠Experimentalavx512bf16,avx512f
Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm512_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512f
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
_mm_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
_mm_mask_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm_mask_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
_mm_maskz_cvtne2ps_pbh^⚠Experimentalavx512bf16,avx512vl
Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
_mm_maskz_dpbf16_ps^⚠Experimentalavx512bf16,avx512vl
Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
cvtne2ps2bf16 🔒 ^⚠Experimental
cvtne2ps2bf16_256 🔒 ^⚠Experimental
cvtne2ps2bf16_512 🔒 ^⚠Experimental
cvtneps2bf16_256 🔒 ^⚠Experimental
cvtneps2bf16_512 🔒 ^⚠Experimental
dpbf16ps 🔒 ^⚠Experimental
dpbf16ps_256 🔒 ^⚠Experimental
dpbf16ps_512 🔒 ^⚠Experimental