llvm/llvm-project

[X86] Missed Optimization: Vector 8-bit `rotr(x, 1)` should be lowered as `pavgb(x, -(x & 1))`

Closed

#198060 opened on May 16, 2026

View on GitHub
 (8 comments) (0 reactions) (1 assignee)C++ (26,378 stars) (10,782 forks)batch import
backend:X86good first issuemissed-optimization

Description

Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:

rotr1_src:
        movdqa  xmm1, xmm0
        psrlw   xmm1, 1
        pand    xmm1, xmmword ptr [rip + .LCPI1_0]
        psllw   xmm0, 7
        pand    xmm0, xmmword ptr [rip + .LCPI1_1]
        por     xmm0, xmm1
        ret

The right shift and least significant bit propagation can be done using the pavgb instruction, which performs a ceiling average. It can be used to shift right by 1 and then conditionally set the MSB based on the mask input (as the false case has a zero LSB):

rotr1_tgt:
        movdqa  xmm1, xmmword ptr [rip + .LCPI1_0]
        pand    xmm1, xmm0
        pxor    xmm2, xmm2
        psubb   xmm2, xmm1
        pavgb   xmm0, xmm2
        ret

https://godbolt.org/z/scsce9YTE

This uses less operations and avoids the shift, which has worse throughput than pavgb on some architectures.

Contributor guide