dotnet/runtime

Suboptimal codgen for `Vector128.NarrowWithSaturation`

Open

#116,526 opened on Jun 11, 2025

View on GitHub
 (8 comments) (1 reaction) (0 assignees)C# (17,886 stars) (5,445 forks)batch import
area-CodeGen-coreclrhelp wanted

Description

I expect PackSources0 and PackSources1 to have exactly the same codegen in the following code snippet:

static Vector128<byte> PackSources1(Vector128<ushort> lower, Vector128<ushort> upper)
    => Vector128.NarrowWithSaturation(lower, upper);

static Vector128<byte> PackSources0(Vector128<ushort> lower, Vector128<ushort> upper)
    => Sse2.PackUnsignedSaturate(
        Vector128.Min(lower, Vector128.Create((ushort)255)).AsInt16(),
        Vector128.Min(upper, Vector128.Create((ushort)255)).AsInt16());
// coreclr trunk-20250611+5415b7342d44af9c974905760539f198fad13682

C:PackSources1(System.Runtime.Intrinsics.Vector128`1[ushort],System.Runtime.Intrinsics.Vector128`1[ushort]):System.Runtime.Intrinsics.Vector128`1[byte] (FullOpts):
       vbroadcastss xmm0, dword ptr [reloc @RWD00]
       vpminuw  xmm1, xmm0, xmmword ptr [rsp+0x08]
       vpand    xmm1, xmm1, xmm0
       vpminuw  xmm2, xmm0, xmmword ptr [rsp+0x18]
       vpand    xmm0, xmm2, xmm0
       vpackuswb xmm0, xmm1, xmm0
       vmovups  xmmword ptr [rdi], xmm0
       mov      rax, rdi
       ret      
RWD00  	dd	00FF00FFh		; 2.34184e-38

C:PackSources0(System.Runtime.Intrinsics.Vector128`1[ushort],System.Runtime.Intrinsics.Vector128`1[ushort]):System.Runtime.Intrinsics.Vector128`1[byte] (FullOpts):
       vbroadcastss xmm0, dword ptr [reloc @RWD00]
       vpminuw  xmm1, xmm0, xmmword ptr [rsp+0x08]
       vpminuw  xmm0, xmm0, xmmword ptr [rsp+0x18]
       vpackuswb xmm0, xmm1, xmm0
       vmovups  xmmword ptr [rdi], xmm0
       mov      rax, rdi
       ret      
RWD00  	dd	00FF00FFh		; 2.34184e-38

https://csharp.godbolt.org/z/o39G8GP9T

Related: https://github.com/dotnet/runtime/pull/115525

cc: @tannergooding

Contributor guide