llvm/llvm-project

[X86] Missed Optimization: Vector 8-bit `rotr(x, 1)` should be lowered as `pavgb(x, -(x & 1))`

Closed

#198,060 建立於 2026年5月16日

在 GitHub 查看
 (8 留言) (0 反應) (1 負責人)C++ (26,378 star) (10,782 fork)batch import
backend:X86good first issuemissed-optimization

描述

Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:

rotr1_src:
        movdqa  xmm1, xmm0
        psrlw   xmm1, 1
        pand    xmm1, xmmword ptr [rip + .LCPI1_0]
        psllw   xmm0, 7
        pand    xmm0, xmmword ptr [rip + .LCPI1_1]
        por     xmm0, xmm1
        ret

The right shift and least significant bit propagation can be done using the pavgb instruction, which performs a ceiling average. It can be used to shift right by 1 and then conditionally set the MSB based on the mask input (as the false case has a zero LSB):

rotr1_tgt:
        movdqa  xmm1, xmmword ptr [rip + .LCPI1_0]
        pand    xmm1, xmm0
        pxor    xmm2, xmm2
        psubb   xmm2, xmm1
        pavgb   xmm0, xmm2
        ret

https://godbolt.org/z/scsce9YTE

This uses less operations and avoids the shift, which has worse throughput than pavgb on some architectures.

貢獻者指南