dotnet/runtime

Suboptimal codgen for `Vector128.NarrowWithSaturation`

Open

#116,526 建立於 2025年6月11日

在 GitHub 查看
 (8 留言) (1 反應) (0 負責人)C# (17,886 star) (5,445 fork)batch import
area-CodeGen-coreclrhelp wanted

描述

I expect PackSources0 and PackSources1 to have exactly the same codegen in the following code snippet:

static Vector128<byte> PackSources1(Vector128<ushort> lower, Vector128<ushort> upper)
    => Vector128.NarrowWithSaturation(lower, upper);

static Vector128<byte> PackSources0(Vector128<ushort> lower, Vector128<ushort> upper)
    => Sse2.PackUnsignedSaturate(
        Vector128.Min(lower, Vector128.Create((ushort)255)).AsInt16(),
        Vector128.Min(upper, Vector128.Create((ushort)255)).AsInt16());
// coreclr trunk-20250611+5415b7342d44af9c974905760539f198fad13682

C:PackSources1(System.Runtime.Intrinsics.Vector128`1[ushort],System.Runtime.Intrinsics.Vector128`1[ushort]):System.Runtime.Intrinsics.Vector128`1[byte] (FullOpts):
       vbroadcastss xmm0, dword ptr [reloc @RWD00]
       vpminuw  xmm1, xmm0, xmmword ptr [rsp+0x08]
       vpand    xmm1, xmm1, xmm0
       vpminuw  xmm2, xmm0, xmmword ptr [rsp+0x18]
       vpand    xmm0, xmm2, xmm0
       vpackuswb xmm0, xmm1, xmm0
       vmovups  xmmword ptr [rdi], xmm0
       mov      rax, rdi
       ret      
RWD00  	dd	00FF00FFh		; 2.34184e-38

C:PackSources0(System.Runtime.Intrinsics.Vector128`1[ushort],System.Runtime.Intrinsics.Vector128`1[ushort]):System.Runtime.Intrinsics.Vector128`1[byte] (FullOpts):
       vbroadcastss xmm0, dword ptr [reloc @RWD00]
       vpminuw  xmm1, xmm0, xmmword ptr [rsp+0x08]
       vpminuw  xmm0, xmm0, xmmword ptr [rsp+0x18]
       vpackuswb xmm0, xmm1, xmm0
       vmovups  xmmword ptr [rdi], xmm0
       mov      rax, rdi
       ret      
RWD00  	dd	00FF00FFh		; 2.34184e-38

https://csharp.godbolt.org/z/o39G8GP9T

Related: https://github.com/dotnet/runtime/pull/115525

cc: @tannergooding

貢獻者指南

Suboptimal codgen for `Vector128.NarrowWithSaturation` · dotnet/runtime#116526 | Good First Issue