llvm/llvm-project

[CIR] Upstream handling of AArch64 (Arm) Neon builtins

Open

#185382 opened on Mar 9, 2026

View on GitHub
 (84 comments) (3 reactions) (0 assignees)C++ (26,378 stars) (10,782 forks)batch import
ClangIRgood first issue

Description

Overview

This is an umbrella issue for upstreaming all AArch64 builtins to ClangIR.

There are enough AArch64-specific builtins that creating separate issues for each logical group is not practical. Multiple contributors can work on this in parallel, provided we coordinate to minimize overlap. To help with that, a partial list of builtins that need to be upstreamed is included below.

If you would like to contribute, please comment indicating which builtin(s) you plan to work on.

Builtins will be removed from this list as they are upstreamed.

Before beginning work, please check the table below to ensure no one else is already working on the same intrinsics. If you have any questions, please comment here and tag @banach-space.

Needed builtins

Intrinsic Group Common prefix Assignee PR(s) Status
Set vector lane vset_lane_*, vsetq_lane_* @abhijeetsharma200 #186623 (*) vset_lane_i8 ...
Extract one element from vector vget_lane_*, vgetq_lane_* @Ayush3941 #186119 (*) vget_lane_i8 ...
Vector saturating shift left vqshl_* @Ko496-glitch https://github.com/llvm/llvm-project/pull/190728 https://github.com/llvm/llvm-project/pull/199153 WIP
Vector arithmetic (FP16) v*_f16 @rafe-murray https://github.com/llvm/llvm-project/pull/190310 + https://github.com/llvm/llvm-project/pull/194865 WIP
Fused multiply-accumulate vfma_* @yairbenavraham #188190 https://github.com/llvm/llvm-project/pull/195602 https://github.com/llvm/llvm-project/pull/197084 (*) *vfmaq_laneq_v,
Pairwise maximum vpmax_*, vpmaxq_* @Isocyy (*) vpmax_s8 ...
Vector saturating shift right and narrow vqshrn_*, vqshrn_high_*, vqshrun_* @IAmCheese1231 https://github.com/llvm/llvm-project/pull/195085 + https://github.com/llvm/llvm-project/pull/195080 + https://github.com/llvm/llvm-project/pull/195040 (*) vqshrun_n_s16 ...
Vector saturating rounding shift right and narrow vqrshrn_*, vqrshrn_high_*, vqrshrun_* @Ko496-glitch https://github.com/llvm/llvm-project/pull/198216 (*) vqrshrun_n_s16 ...
Rounding vrnd*_ *, vrndq*_ *, ... @AbdallahRashed (*) vrndaq_f32 ...
Conversions vcvt_*, vcvtq_* @banach-space https://github.com/llvm/llvm-project/pull/190961 https://github.com/llvm/llvm-project/pull/193273 WIP
Vector shift right and insert vsri_*, vsriq_* @iamvickynguyen #196776 (*) vsri_n_s8 ...
Vector shift left and insert vsli_*, vsliq_* @E00N777 https://github.com/llvm/llvm-project/pull/199415 https://github.com/llvm/llvm-project/pull/198309 (*) vsli_n_s8 ...
Maximum across vector (IEEE754 vmaxnmv_f32 @Ko496-glitch

(*) Special cases that are implemented in CIRGenFunction::emitAArch64BuiltinExpr in the ClangIR incubator repository - these are usually the easiest starting point.

Completed builtins

Intrinsic Group Common prefix Assignee PR(s) Status
Split vectors (BFloat16) vget{q}_lane_bf16 @fileho #186866
Set all lanes to the same value (BFloat16) vduph_lane{q}_bf16 @E00N777 #185852 #187460
Set all lanes to the same value (FP16) vdup*_f16 @neonetizen #186955
Widening multiplication + Polynomial Multiply vmull_*, vmull_high_* @pau-sum #188371
Maximum vmax_*, vmaxq_* @Xinlong-Chen #188503
Bitwise select vbsl_* @E00N777 #188449
Vector shift right vshr_* @alowqie https://github.com/llvm/llvm-project/pull/186693
Minimum vmin_*, vminq_* @YGGkk #187935
Square root vsqrt_*, vsqrtq_* @Kouunnn https://github.com/llvm/llvm-project/pull/192282
Pairwise minimum vpmin_*, vpminq_* @iamvickynguyen https://github.com/llvm/llvm-project/pull/191759
Minimum across vector vminv_*, vminvq_* @E00N777 https://github.com/llvm/llvm-project/pull/192901
Vector rounding shift right and accumulate vrsra_* @E00N777 https://github.com/llvm/llvm-project/pull/191129
Pairwise addition and widen vpaddl_*, vpaddlq_* @xiongzile https://github.com/llvm/llvm-project/pull/191845
Vector rounding shift right vrshr_* @ArfiH #185992 + https://github.com/llvm/llvm-project/pull/194229
Absolute difference vabd_*, vabdq_* @banach-space #183595
Addition across vector + Addition across vector widening vaddv_*, vaddvq_* @iamvickynguyen https://github.com/llvm/llvm-project/pull/193396
Zip elements vzip_*, vzipq_* @E00N777 #193658 https://github.com/llvm/llvm-project/pull/194311
Unzip elements vuzp_*, vuzpq_* @E00N777 https://github.com/llvm/llvm-project/pull/195591/ https://github.com/llvm/llvm-project/pull/195527
Transpose elements vtrn_*, vtrnq_* @E00N777 https://github.com/llvm/llvm-project/pull/197112 https://github.com/llvm/llvm-project/pull/197651
Vector shift left vshl_* @albertbolt1 https://github.com/llvm/llvm-project/pull/186406 https://github.com/llvm/llvm-project/pull/187516 https://github.com/llvm/llvm-project/pull/191655
Maximum across vector vmaxv_*, vmaxvq_* @Ko496-glitch https://github.com/llvm/llvm-project/pull/194401 https://github.com/llvm/llvm-project/pull/197095

Implementation requirements

  1. For each intrinsic group listed in the Arm Neon Intrinsics Reference, ensure that all variants are supported and tested (*). If some variants are missing in the ClangIR incubator repository, please implement them.

  2. Reuse the existing AArch64 builtin tests located in clang/test/CodeGen/AArch64. These tests will need to be moved to the neon subdirectory, which enables ClangIR testing. For more context, see: https://github.com/llvm/llvm-project/issues/179952.

  3. Prefer to preserve the high-level structure of CIRGenBuiltinAArch64.cpp so that switch cases and handling remain visibly consistent with ARM.cpp; limited refactors are allowed when they improve maintainability but must be explained in PR descriptions.

  4. Format tests using the pre-existing style, see intrinsics.c for reference.

References

CC @andykaylor


(*) Every variant listed in the ACLE Neon reference for each intrinsic group, including all element types and vector widths (e.g., i8/16/32/64, q/duph/lanes, immediate and non-immediate forms).

Contributor guide