[X86] Missed Optimization: Vector 8-bit `rotr(x, 1)` should be lowered as `pavgb(x, -(x & 1))` · llvm/llvm-project#198060

(8 comments) (0 reactions) (1 assignee)C++ (26,378 stars) (10,782 forks)batch import

backend:X86good first issuemissed-optimization

Description

Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:

rotr1_src:
        movdqa  xmm1, xmm0
        psrlw   xmm1, 1
        pand    xmm1, xmmword ptr [rip + .LCPI1_0]
        psllw   xmm0, 7
        pand    xmm0, xmmword ptr [rip + .LCPI1_1]
        por     xmm0, xmm1
        ret

The right shift and least significant bit propagation can be done using the pavgb instruction, which performs a ceiling average. It can be used to shift right by 1 and then conditionally set the MSB based on the mask input (as the false case has a zero LSB):

rotr1_tgt:
        movdqa  xmm1, xmmword ptr [rip + .LCPI1_0]
        pand    xmm1, xmm0
        pxor    xmm2, xmm2
        psubb   xmm2, xmm1
        pavgb   xmm0, xmm2
        ret

https://godbolt.org/z/scsce9YTE

This uses less operations and avoids the shift, which has worse throughput than pavgb on some architectures.

Contributor guide

Tech stack
Domain
Issue type
Difficulty
Estimated time
Activity status
Clarity
Prerequisites
Newbie friendliness
Research direction