llvm/llvm-project

[X86] Missed Optimization: Vector 8-bit `rotl(x, 1)` should be lowered as `(x + x) - (x < 0)`

Open

#198059 opened on May 16, 2026

View on GitHub
 (8 comments) (0 reactions) (1 assignee)C++ (26,378 stars) (10,782 forks)batch import
backend:X86good first issuemissed-optimization

Description

Due to a lack of support, most 8-bit shifts are implemented using a 16-bit shift + AND:

rotl1_src:
        movdqa  xmm1, xmm0
        paddb   xmm1, xmm0
        psrlw   xmm0, 7
        pand    xmm0, xmmword ptr [rip + .LCPI2_0]
        por     xmm0, xmm1
        ret

The OR and right shift can be replaced with a subtraction by a less-than-zero mask, which acts like a conditional disjoint add by 1. This shortens the dependency chain and avoids the shift, which has worse throughput on some architectures.

rotl1_tgt:
        pxor    xmm1, xmm1
        pcmpgtb xmm1, xmm0
        paddb   xmm0, xmm0
        psubb   xmm0, xmm1
        ret

https://godbolt.org/z/199KoWhs8

Contributor guide