[CIR] Upstream handling of AArch64 (Arm) Neon builtins · llvm/llvm-project#185382

(92 comments) (3 reactions) (0 assignees)C++ (10,782 forks)batch import

ClangIRgood first issue

Repository metrics

Stars: (26,378 stars)
PR merge metrics: (Avg merge 1d 2h) (1,000 merged PRs in 30d)

Description

Overview

This is an umbrella issue for upstreaming all AArch64 builtins to ClangIR.

There are enough AArch64-specific builtins that creating separate issues for each logical group is not practical. Multiple contributors can work on this in parallel, provided we coordinate to minimize overlap. To help with that, a partial list of builtins that need to be upstreamed is included below.

If you would like to contribute, please comment indicating which builtin(s) you plan to work on.

Builtins will be removed from this list as they are upstreamed.

Before beginning work, please check the table below to ensure no one else is already working on the same intrinsics. If you have any questions, please comment here and tag @banach-space.

Needed builtins

Intrinsic Group	Common prefix	Assignee	PR(s)	Status
Set vector lane	`vset_lane_`, `vsetq_lane_`	@abhijeetsharma200	#186623	(*) `vset_lane_i8` ...
Extract one element from vector	`vget_lane_`, `vgetq_lane_`	@Ayush3941	#186119	(*) `vget_lane_i8` ...
Vector saturating shift left	`vqshl_*`	@Ko496-glitch	https://github.com/llvm/llvm-project/pull/190728 https://github.com/llvm/llvm-project/pull/199153	WIP
Vector arithmetic (FP16)	`v*_f16`	@rafe-murray	https://github.com/llvm/llvm-project/pull/190310 + https://github.com/llvm/llvm-project/pull/194865	WIP
Fused multiply-accumulate	`vfma_*`	@yairbenavraham	#188190 https://github.com/llvm/llvm-project/pull/195602 https://github.com/llvm/llvm-project/pull/197084	() `vfmaq_laneq_v`,
Pairwise maximum	`vpmax_`, `vpmaxq_`	@E00N777	https://github.com/llvm/llvm-project/pull/201495	(*) `vpmax_s8` ...
Vector saturating shift right and narrow	`vqshrn_`, `vqshrn_high_`, `vqshrun_*`	@IAmCheese1231	https://github.com/llvm/llvm-project/pull/195085 + https://github.com/llvm/llvm-project/pull/195080 + https://github.com/llvm/llvm-project/pull/195040	(*) `vqshrun_n_s16` ...
Vector saturating rounding shift right and narrow	`vqrshrn_`, `vqrshrn_high_`, `vqrshrun_*`	@Ko496-glitch	https://github.com/llvm/llvm-project/pull/198216 https://github.com/llvm/llvm-project/pull/200113 https://github.com/llvm/llvm-project/pull/198947	✓
Rounding	`vrnd_ `, `vrndq_ `, ...	@AbdallahRashed	https://github.com/llvm/llvm-project/pull/195021	(*) `vrndaq_f32` ...
Conversions	`vcvt_`, `vcvtq_`	@banach-space	https://github.com/llvm/llvm-project/pull/190961 https://github.com/llvm/llvm-project/pull/193273 https://github.com/llvm/llvm-project/pull/199990	WIP
Addition		@iamvickynguyen
Widening addition		@iamvickynguyen

(*) Special cases that are implemented in CIRGenFunction::emitAArch64BuiltinExpr in the ClangIR incubator repository - these are usually the easiest starting point.

Completed builtins

Intrinsic Group	Common prefix	Assignee	PR(s)	Status
Split vectors (BFloat16)	`vget{q}_lane_bf16`	@fileho	#186866	✓
Set all lanes to the same value (BFloat16)	`vduph_lane{q}_bf16`	@E00N777	#185852 #187460	✓
Set all lanes to the same value (FP16)	`vdup*_f16`	@neonetizen	#186955	✓
Widening multiplication + Polynomial Multiply	`vmull_`, `vmull_high_`	@pau-sum	#188371	✓
Maximum	`vmax_`, `vmaxq_`	@Xinlong-Chen	#188503	✓
Bitwise select	`vbsl_*`	@E00N777	#188449	✓
Vector shift right	`vshr_*`	@alowqie	https://github.com/llvm/llvm-project/pull/186693	✓
Minimum	`vmin_`, `vminq_`	@YGGkk	#187935	✓
Square root	`vsqrt_`, `vsqrtq_`	@Kouunnn	https://github.com/llvm/llvm-project/pull/192282	✓
Pairwise minimum	`vpmin_`, `vpminq_`	@iamvickynguyen	https://github.com/llvm/llvm-project/pull/191759	✓
Minimum across vector	`vminv_`, `vminvq_`	@E00N777	https://github.com/llvm/llvm-project/pull/192901	✓
Vector rounding shift right and accumulate	`vrsra_*`	@E00N777	https://github.com/llvm/llvm-project/pull/191129	✓
Pairwise addition and widen	`vpaddl_`, `vpaddlq_`	@xiongzile	https://github.com/llvm/llvm-project/pull/191845	✓
Vector rounding shift right	`vrshr_*`	@ArfiH	#185992 + https://github.com/llvm/llvm-project/pull/194229	✓
Absolute difference	`vabd_`, `vabdq_`	@banach-space	#183595	✓
Addition across vector + Addition across vector widening	`vaddv_`, `vaddvq_`	@iamvickynguyen	https://github.com/llvm/llvm-project/pull/193396	✓
Zip elements	`vzip_`, `vzipq_`	@E00N777	#193658 https://github.com/llvm/llvm-project/pull/194311	✓
Unzip elements	`vuzp_`, `vuzpq_`	@E00N777	https://github.com/llvm/llvm-project/pull/195591/ https://github.com/llvm/llvm-project/pull/195527	✓
Transpose elements	`vtrn_`, `vtrnq_`	@E00N777	https://github.com/llvm/llvm-project/pull/197112 https://github.com/llvm/llvm-project/pull/197651	✓
Vector shift left	`vshl_*`	@albertbolt1	https://github.com/llvm/llvm-project/pull/186406 https://github.com/llvm/llvm-project/pull/187516 https://github.com/llvm/llvm-project/pull/191655	✓
Maximum across vector	`vmaxv_`, `vmaxvq_`	@Ko496-glitch	https://github.com/llvm/llvm-project/pull/194401 https://github.com/llvm/llvm-project/pull/197095	✓
Vector shift right and insert	`vsri_`, `vsriq_`	@iamvickynguyen	#196776	✓
Vector shift right and accumulate	`vsra_`, `vsraq_`	@iamvickynguyen	https://github.com/llvm/llvm-project/pull/200630	✓
Vector shift left and insert	`vsli_`, `vsliq_`	@E00N777	https://github.com/llvm/llvm-project/pull/199415 https://github.com/llvm/llvm-project/pull/198309	✓
Maximum across vector (IEEE754	`vmaxnmv_f32`	@Ko496-glitch	https://github.com/llvm/llvm-project/pull/199779	✓

Implementation requirements

For each intrinsic group listed in the Arm Neon Intrinsics Reference, ensure that all variants are supported and tested (*). If some variants are missing in the ClangIR incubator repository, please implement them.
Reuse the existing AArch64 builtin tests located in clang/test/CodeGen/AArch64. These tests will need to be moved to the neon subdirectory, which enables ClangIR testing. For more context, see: https://github.com/llvm/llvm-project/issues/179952.
Prefer to preserve the high-level structure of CIRGenBuiltinAArch64.cpp so that switch cases and handling remain visibly consistent with ARM.cpp; limited refactors are allowed when they improve maintainability but must be explained in PR descriptions.
Format tests using the pre-existing style, see intrinsics.c for reference.

References

Previous umbrella tickets for X86: https://github.com/llvm/llvm-project/issues/167752 + https://github.com/llvm/llvm-project/issues/167765
Neon overviews: https://www.arm.com/technologies/neon + https://developer.arm.com/Architectures/Neon
Neon ACLE: https://arm-software.github.io/acle/neon_intrinsics/

CC @andykaylor

(*) Every variant listed in the ACLE Neon reference for each intrinsic group, including all element types and vector widths (e.g., i8/16/32/64, q/duph/lanes, immediate and non-immediate forms).

Contributor guide

Research direction: Pick a simple builtin group (e.g., vset lane) from the table, study the existing incubator code in CIRGenBuiltinAArch64.cpp, ensure all ACLE variants are implemented, and add test coverage by moving existing tests to the neon subdirectory.
Tech stack: None
Domain: backend
Issue type: Feature
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: GitC++LLVM/Clang internals
Newbie friendliness: 75