Overview
This is an umbrella issue for upstreaming all AArch64 builtins to ClangIR.
There are enough AArch64-specific builtins that creating separate issues for each logical group is not practical. Multiple contributors can work on this in parallel, provided we coordinate to minimize overlap. To help with that, a partial list of builtins that need to be upstreamed is included below.
If you would like to contribute, please comment indicating which builtin(s) you plan to work on.
Builtins will be removed from this list as they are upstreamed.
Before beginning work, please check the table below to ensure no one else is already working on the same intrinsics. If you have any questions, please comment here and tag @banach-space.
Needed builtins
| Intrinsic Group |
Common prefix |
Assignee |
PR(s) |
Status |
| Set vector lane |
vset_lane_*, vsetq_lane_* |
@abhijeetsharma200 |
#186623 |
(*) vset_lane_i8 ... |
| Extract one element from vector |
vget_lane_*, vgetq_lane_* |
@Ayush3941 |
#186119 |
(*) vget_lane_i8 ... |
| Vector saturating shift left |
vqshl_* |
@Ko496-glitch |
https://github.com/llvm/llvm-project/pull/190728 https://github.com/llvm/llvm-project/pull/199153 |
WIP |
| Vector arithmetic (FP16) |
v*_f16 |
@rafe-murray |
https://github.com/llvm/llvm-project/pull/190310 + https://github.com/llvm/llvm-project/pull/194865 |
WIP |
| Fused multiply-accumulate |
vfma_* |
@yairbenavraham |
#188190 https://github.com/llvm/llvm-project/pull/195602 https://github.com/llvm/llvm-project/pull/197084 |
(*) *vfmaq_laneq_v, |
| Pairwise maximum |
vpmax_*, vpmaxq_* |
@Isocyy |
|
(*) vpmax_s8 ... |
| Vector saturating shift right and narrow |
vqshrn_*, vqshrn_high_*, vqshrun_* |
@IAmCheese1231 |
https://github.com/llvm/llvm-project/pull/195085 + https://github.com/llvm/llvm-project/pull/195080 + https://github.com/llvm/llvm-project/pull/195040 |
(*) vqshrun_n_s16 ... |
| Vector saturating rounding shift right and narrow |
vqrshrn_*, vqrshrn_high_*, vqrshrun_* |
@Ko496-glitch |
https://github.com/llvm/llvm-project/pull/198216 |
(*) vqrshrun_n_s16 ... |
| Rounding |
vrnd*_ *, vrndq*_ *, ... |
@AbdallahRashed |
|
(*) vrndaq_f32 ... |
| Conversions |
vcvt_*, vcvtq_* |
@banach-space |
https://github.com/llvm/llvm-project/pull/190961 https://github.com/llvm/llvm-project/pull/193273 |
WIP |
| Vector shift right and insert |
vsri_*, vsriq_* |
@iamvickynguyen |
#196776 |
(*) vsri_n_s8 ... |
| Vector shift left and insert |
vsli_*, vsliq_* |
@E00N777 |
https://github.com/llvm/llvm-project/pull/199415 https://github.com/llvm/llvm-project/pull/198309 |
(*) vsli_n_s8 ... |
| Maximum across vector (IEEE754 |
vmaxnmv_f32 |
@Ko496-glitch |
|
|
(*) Special cases that are implemented in CIRGenFunction::emitAArch64BuiltinExpr in the ClangIR incubator repository - these are usually the easiest starting point.
Completed builtins
| Intrinsic Group |
Common prefix |
Assignee |
PR(s) |
Status |
| Split vectors (BFloat16) |
vget{q}_lane_bf16 |
@fileho |
#186866 |
✓ |
| Set all lanes to the same value (BFloat16) |
vduph_lane{q}_bf16 |
@E00N777 |
#185852 #187460 |
✓ |
| Set all lanes to the same value (FP16) |
vdup*_f16 |
@neonetizen |
#186955 |
✓ |
| Widening multiplication + Polynomial Multiply |
vmull_*, vmull_high_* |
@pau-sum |
#188371 |
✓ |
| Maximum |
vmax_*, vmaxq_* |
@Xinlong-Chen |
#188503 |
✓ |
| Bitwise select |
vbsl_* |
@E00N777 |
#188449 |
✓ |
| Vector shift right |
vshr_* |
@alowqie |
https://github.com/llvm/llvm-project/pull/186693 |
✓ |
| Minimum |
vmin_*, vminq_* |
@YGGkk |
#187935 |
✓ |
| Square root |
vsqrt_*, vsqrtq_* |
@Kouunnn |
https://github.com/llvm/llvm-project/pull/192282 |
✓ |
| Pairwise minimum |
vpmin_*, vpminq_* |
@iamvickynguyen |
https://github.com/llvm/llvm-project/pull/191759 |
✓ |
| Minimum across vector |
vminv_*, vminvq_* |
@E00N777 |
https://github.com/llvm/llvm-project/pull/192901 |
✓ |
| Vector rounding shift right and accumulate |
vrsra_* |
@E00N777 |
https://github.com/llvm/llvm-project/pull/191129 |
✓ |
| Pairwise addition and widen |
vpaddl_*, vpaddlq_* |
@xiongzile |
https://github.com/llvm/llvm-project/pull/191845 |
✓ |
| Vector rounding shift right |
vrshr_* |
@ArfiH |
#185992 + https://github.com/llvm/llvm-project/pull/194229 |
✓ |
| Absolute difference |
vabd_*, vabdq_* |
@banach-space |
#183595 |
✓ |
| Addition across vector + Addition across vector widening |
vaddv_*, vaddvq_* |
@iamvickynguyen |
https://github.com/llvm/llvm-project/pull/193396 |
✓ |
| Zip elements |
vzip_*, vzipq_* |
@E00N777 |
#193658 https://github.com/llvm/llvm-project/pull/194311 |
✓ |
| Unzip elements |
vuzp_*, vuzpq_* |
@E00N777 |
https://github.com/llvm/llvm-project/pull/195591/ https://github.com/llvm/llvm-project/pull/195527 |
✓ |
| Transpose elements |
vtrn_*, vtrnq_* |
@E00N777 |
https://github.com/llvm/llvm-project/pull/197112 https://github.com/llvm/llvm-project/pull/197651 |
✓ |
| Vector shift left |
vshl_* |
@albertbolt1 |
https://github.com/llvm/llvm-project/pull/186406 https://github.com/llvm/llvm-project/pull/187516 https://github.com/llvm/llvm-project/pull/191655 |
✓ |
| Maximum across vector |
vmaxv_*, vmaxvq_* |
@Ko496-glitch |
https://github.com/llvm/llvm-project/pull/194401 https://github.com/llvm/llvm-project/pull/197095 |
✓ |
Implementation requirements
-
For each intrinsic group listed in the Arm Neon Intrinsics Reference, ensure that all variants are supported and tested (*). If some variants are missing in the ClangIR incubator repository, please implement them.
-
Reuse the existing AArch64 builtin tests located in clang/test/CodeGen/AArch64. These tests will need to be moved to the neon subdirectory, which enables ClangIR testing. For more context, see: https://github.com/llvm/llvm-project/issues/179952.
-
Prefer to preserve the high-level structure of CIRGenBuiltinAArch64.cpp so that switch cases and handling remain visibly consistent with ARM.cpp; limited refactors are allowed when they improve maintainability but must be explained in PR descriptions.
-
Format tests using the pre-existing style, see intrinsics.c for reference.
References
CC @andykaylor
(*) Every variant listed in the ACLE Neon reference for each intrinsic group, including all element types and vector widths (e.g., i8/16/32/64, q/duph/lanes, immediate and non-immediate forms).