llvm/llvm-project
View on GitHub[X86] Missed Optimization: Vector 8-bit `rotl(x, 1)` should be lowered as `(x + x) - (x < 0)`
Open
#198059 opened on May 16, 2026
backend:X86good first issuemissed-optimization
Description
Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:
rotl1_src:
movdqa xmm1, xmm0
paddb xmm1, xmm0
psrlw xmm0, 7
pand xmm0, xmmword ptr [rip + .LCPI2_0]
por xmm0, xmm1
ret
The OR and right shift can be replaced with a subtraction by a less-than-zero mask, which acts like a conditional disjoint add by 1. This shortens the dependency chain and avoids the shift, which has worse throughput on some architectures.
rotl1_tgt:
pxor xmm1, xmm1
pcmpgtb xmm1, xmm0
paddb xmm0, xmm0
psubb xmm0, xmm1
ret