dotnet/runtime
GitHub で見るSubtract-variant of `VectorXXX.MultiplyAddEstimate` does not light up for negated constant local
Open
#121,301 opened on 2025年11月3日
area-CodeGen-coreclrhelp wantedtenet-performance
説明
Description
public static Vector128<double> BadCode(Vector128<double> a)
{
Vector128<double> half = Vector128.Create(0.5);
return Vector128.MultiplyAddEstimate(a, Vector128.Create(2.0), -half) * half;
}
public static Vector128<double> GoodCode(Vector128<double> a, Vector128<double> half)
{
// Fma.MultiplyAdd lits up too!
return Vector128.MultiplyAddEstimate(a, Vector128.Create(2.0), -half) * half;
}
Regression?
No
Data
// coreclr trunk-20251102+e8812e7419db9137f20b990786a53ed71e27e11e
C:BadCode(System.Runtime.Intrinsics.Vector128`1[double]):System.Runtime.Intrinsics.Vector128`1[double] (FullOpts):
vmovddup xmm0, qword ptr [reloc @RWD00]
vmovddup xmm1, qword ptr [reloc @RWD08]
vmovaps xmm2, xmmword ptr [rsp+0x08]
vfmadd213pd xmm2, xmm1, xmmword ptr [reloc @RWD16]
vmulpd xmm0, xmm2, xmm0
vmovups xmmword ptr [rdi], xmm0
mov rax, rdi
ret
RWD00 dq 3FE0000000000000h
RWD08 dq 4000000000000000h
RWD16 dq BFE0000000000000h, BFE0000000000000h
C:GoodCode(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):System.Runtime.Intrinsics.Vector128`1[double] (FullOpts):
vmovups xmm0, xmmword ptr [rsp+0x18]
vmovaps xmm1, xmmword ptr [rsp+0x08]
vfmsub132pd xmm1, xmm0, xmmword ptr [reloc @RWD00]
vmulpd xmm0, xmm1, xmm0
vmovups xmmword ptr [rdi], xmm0
mov rax, rdi
ret
RWD00 dq 4000000000000000h, 4000000000000000h
Analysis
Turning locals into constant data seems to take priority over negated FMA.