llvm/llvm-project
View on GitHub[X86] Missed Optimization: Vector 8-bit `rotr(x, 1)` should be lowered as `pavgb(x, -(x & 1))`
Closed
#198060 opened on May 16, 2026
backend:X86good first issuemissed-optimization
Description
Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:
rotr1_src:
movdqa xmm1, xmm0
psrlw xmm1, 1
pand xmm1, xmmword ptr [rip + .LCPI1_0]
psllw xmm0, 7
pand xmm0, xmmword ptr [rip + .LCPI1_1]
por xmm0, xmm1
ret
The right shift and least significant bit propagation can be done using the pavgb instruction, which performs a ceiling average. It can be used to shift right by 1 and then conditionally set the MSB based on the mask input (as the false case has a zero LSB):
rotr1_tgt:
movdqa xmm1, xmmword ptr [rip + .LCPI1_0]
pand xmm1, xmm0
pxor xmm2, xmm2
psubb xmm2, xmm1
pavgb xmm0, xmm2
ret
https://godbolt.org/z/scsce9YTE
This uses less operations and avoids the shift, which has worse throughput than pavgb on some architectures.