Module core::core_arch::x86::avx512bf16

source ·
🔬This is a nightly-only experimental API. (stdsimd #48556)
Available on x86 or x86-64 only.
Expand description

Functions

  • _mm256_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
  • _mm256_cvtneps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
  • _mm256_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
  • _mm256_mask_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm256_mask_cvtneps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm256_mask_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm256_maskz_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm256_maskz_cvtneps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm256_maskz_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_cvtne2ps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a
    512-bit wide vector. Intel’s documentation
  • _mm512_cvtneps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
  • _mm512_dpbf16_psExperimentalavx512bf16,avx512f
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
  • _mm512_mask_cvtne2ps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_mask_cvtneps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_mask_dpbf16_psExperimentalavx512bf16,avx512f
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_maskz_cvtne2ps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_maskz_cvtneps_pbhExperimentalavx512bf16,avx512f
    Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm512_maskz_dpbf16_psExperimentalavx512bf16,avx512f
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
  • _mm_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
  • _mm_mask_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm_mask_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
  • _mm_maskz_cvtne2ps_pbhExperimentalavx512bf16,avx512vl
    Convert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • _mm_maskz_dpbf16_psExperimentalavx512bf16,avx512vl
    Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
  • cvtne2ps2bf16 🔒 Experimental
  • cvtne2ps2bf16_256 🔒 Experimental
  • cvtne2ps2bf16_512 🔒 Experimental
  • cvtneps2bf16_256 🔒 Experimental
  • cvtneps2bf16_512 🔒 Experimental
  • dpbf16ps 🔒 Experimental
  • dpbf16ps_256 🔒 Experimental
  • dpbf16ps_512 🔒 Experimental