Professional Embedded ARM Development (2014)
Part II. Reference
Appendix D. NEON Intrinsics and Instructions
This appendix contains information on, and a list of instructions used with, the NEON engine. Data types, lane types, and intrinsics are listed.
DATA TYPES
Table D-1 lists the different data types supported on the NEON engine, and the corresponding C data types.
TABLE D-1: NEON Data Types
|
DATA TYPE |
D-REGISTER (64 BITS) |
Q-REGISTER (128 BITS) |
|
Signed integers |
int8x8_t |
int8x16_t |
|
int16x4_t |
int16x8_t |
|
|
int32x2_t |
int32x4_t |
|
|
int64x1_t |
int64x2_t |
|
|
Unsigned integers |
uint8x8_t |
uint8x16_t |
|
uint16x4_t |
uint16x8_t |
|
|
uint32x2_t |
uint32x4_t |
|
|
uint64x1_t |
uint64x2_t |
|
|
Floating-point |
float16x4_t |
float16x8_t |
|
float32x2_t |
float32x4_t |
|
|
Polynomial |
poly8x8_t |
poly8x16_t |
|
poly16x4_t |
poly16x8_t |
LANE TYPES
Table D-2 lists the different lane types per class, and the amount of possible types for each class.
TABLE D-2: Data Lane Types
|
CLASS |
COUNT |
TYPES |
|
int |
6 |
int8, int16, int32, uint8, uint16, uint32 |
|
int/64 |
8 |
int8, int16, int32, int64, uint8, uint16, uint32, uint64 |
|
sint |
3 |
int8, int16, int32 |
|
sint16/32 |
2 |
int16, int32 |
|
int32 |
2 |
int32, uint32 |
|
8-bit |
3 |
int8, uint8, poly8 |
|
int/poly8 |
7 |
int8, int16, int32, uint8, uint16, uint32, poly8 |
|
int/64/poly |
10 |
int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16 |
|
arith |
7 |
int8, int16, int32, uint8, uint16, uint32, float32 |
|
arith/64 |
9 |
int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32 |
|
arith/poly8 |
8 |
int8, int16, int32, uint8, uint16, uint32, poly8, float32 |
|
floating |
1 |
float32 |
|
any |
11 |
int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16, float32 |
ASSEMBLY INSTRUCTIONS
Table D-3 contains a list of NEON instructions, as well as a brief description of each instruction.
TABLE D-3: NEON Instructions
|
INSTRUCTION |
DESCRIPTION |
|
VABA |
Absolute difference and Accumulate |
|
VABD |
Absolute difference |
|
VABS |
Absolute Value |
|
VACGE |
Absolute Compare Greater Than or Equal |
|
VACGT |
Absolute Compare Greater Than |
|
VACLE |
Absolute Compare Less Than or Equal |
|
VACLT |
Absolute Compare Less Than |
|
VADD |
Add |
|
VADDHN |
Add, Select High Half |
|
VAND |
Logical AND |
|
VBIC |
Bitwise Bit Clear |
|
VBIF |
Bitwise Insert if False |
|
VBIT |
Bitwise Insert if True |
|
VBSL |
Bitwise Select |
|
VCEQ |
Compare Equal |
|
VCGE |
Compare Greater Than or Equal |
|
VCGT |
Compare Greater Than |
|
VCLE |
Compare Less Than or Equal |
|
VCLS |
Count Leading Sign bits |
|
VCLT |
Compare Less Than |
|
VCLZ |
Count Leading Zeroes |
|
VCNT |
Count set bits |
|
VCVT |
Convert between different number formats |
|
VDUP |
Duplicate scalar to all lanes of vector |
|
VEOR |
Bitwise Exclusive OR |
|
VEXT |
Extract |
|
VFMA |
Fused Multiply and Accumulate |
|
VFMS |
Fused Multiply and Subtract |
|
VHADD |
Halving Add |
|
VHSUB |
Halving Subtract |
|
VLD |
Vector Load |
|
VMAX |
Maximum |
|
VMIN |
Minimum |
|
VMLA |
Multiply and Accumulate |
|
VMLS |
Multiply and Subtract |
|
VMOV |
Move |
|
VMOVL |
Move Long |
|
VMOVN |
Move Narrow |
|
VMUL |
Multiply |
|
VMVN |
Move Negative |
|
VNEG |
Negate |
|
VORN |
Bitwise OR NOT |
|
VORR |
Bitwise OR |
|
VPADAL |
Pairwise Add and Accumulate |
|
VPADD |
Pairwise Add |
|
VPMAX |
Pairwise Maximum |
|
VPMIN |
Pairwise Minimum |
|
VQABS |
Absolute Value, Saturate |
|
VQADD |
Add, Saturate |
|
VQDMLAL |
Saturating Double Multiply Accumulate |
|
VQDMLSL |
Saturating Double Multiply and Subtract |
|
VQDMUL |
Saturating Double Multiply |
|
VQDMULH |
Saturating Double Multiply returning High half |
|
VQMOVN |
Saturating Move |
|
VQNEG |
Negate, Saturate |
|
VQRDMULH |
Saturating Double Multiply returning High half |
|
VQRSHL |
Shift left, Round, Saturate |
|
VQRSHR |
Shift Right, Round, Saturate |
|
VQSHL |
Shift Left, Saturate |
|
VQSHR |
Shift Right, Saturate |
|
VQSUB |
Subtract, Saturate |
|
VRADDH |
Add, Select High Half, Round |
|
VRECPE |
Reciprocal Estimate |
|
VRECPS |
Reciprocal Step |
|
VREV |
Reverse Elements |
|
VRHADD |
Halving Add, Round |
|
VRSHR |
Shift Right and Round |
|
VRSQRTE |
Reciprocal Square Root Estimate |
|
VRSQRTS |
Reciprocal Square Root Step |
|
VRSRA |
Shift Right, Round and Accumulate |
|
VRSUBH |
Subtract, select High half, Round |
|
VSHL |
Shift Left |
|
VSHR |
Shift Right |
|
VSLI |
Shift Left and Insert |
|
VSRA |
Shift Right, Accumulate |
|
VSRI |
Shift Right and Insert |
|
VST |
Vector Store |
|
VSUB |
Subtract |
|
VSUBH |
Subtract, Select High half |
|
VSWP |
Swap Vectors |
|
VTBL |
Vector Table Lookup |
|
VTBX |
Vector Table Extension |
|
VTRN |
Vector Transpose |
|
VTST |
Test Bits |
|
VUZP |
Vector Unzip |
|
VZIP |
Vector Zip |
INTRINSIC NAMING CONVENTIONS
Intrinsics provide an elegant way to write NEON instructions using C. NEON intrinsics are created using the following structure:
v[q][r]name[u][n][q][_lane][_n][_result]_type
where:
q indicates a saturating operation.
r indicates a rounding operation.
name is the descriptive name of the operation.
u indicates signed-to-unsigned saturation.
n indicates a narrowing operation.
q indicates an operation on 128-bit vectors.
_n indicates a scalar operand supplied as an argument.
_lane indicates a scalar operand taken from the lane of a vector.
result is the result type in short form.
For example, vmul_s16 multiplies two vectors of signed 16-bit values and is equivalent to VMUL.I16. Some examples in C include:
uint32x4_t vec128 = vld1q_u32(i); // Load 4 32-bit values
uint8x8_t vadd_u8 (uint8x8_t,
uint8x8_t); //Add two lanes
int8x16_t vaddq_s8 (int8x16_t, int8x16_t); //Saturating add two lanes
All materials on the site are licensed Creative Commons Attribution-Sharealike 3.0 Unported CC BY-SA 3.0 & GNU Free Documentation License (GFDL)
If you are the copyright holder of any material contained on our site and intend to remove it, please contact our site administrator for approval.
© 2016-2025 All site design rights belong to S.Y.A.