NEON Intrinsics and Instructions - Reference - Professional Embedded ARM Development (2014)

Professional Embedded ARM Development (2014)

Part II. Reference

Appendix D. NEON Intrinsics and Instructions

This appendix contains information on, and a list of instructions used with, the NEON engine. Data types, lane types, and intrinsics are listed.

DATA TYPES

Table D-1 lists the different data types supported on the NEON engine, and the corresponding C data types.

TABLE D-1: NEON Data Types

!

DATA TYPE

D-REGISTER (64 BITS)

Q-REGISTER (128 BITS)

Signed integers

int8x8_t

int8x16_t

int16x4_t

int16x8_t

int32x2_t

int32x4_t

int64x1_t

int64x2_t

Unsigned integers

uint8x8_t

uint8x16_t

uint16x4_t

uint16x8_t

uint32x2_t

uint32x4_t

uint64x1_t

uint64x2_t

Floating-point

float16x4_t

float16x8_t

float32x2_t

float32x4_t

Polynomial

poly8x8_t

poly8x16_t

poly16x4_t

poly16x8_t

LANE TYPES

Table D-2 lists the different lane types per class, and the amount of possible types for each class.

TABLE D-2: Data Lane Types

!

CLASS

COUNT

TYPES

int

6

int8, int16, int32, uint8, uint16, uint32

int/64

8

int8, int16, int32, int64, uint8, uint16, uint32, uint64

sint

3

int8, int16, int32

sint16/32

2

int16, int32

int32

2

int32, uint32

8-bit

3

int8, uint8, poly8

int/poly8

7

int8, int16, int32, uint8, uint16, uint32, poly8

int/64/poly

10

int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16

arith

7

int8, int16, int32, uint8, uint16, uint32, float32

arith/64

9

int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32

arith/poly8

8

int8, int16, int32, uint8, uint16, uint32, poly8, float32

floating

1

float32

any

11

int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16, float32

ASSEMBLY INSTRUCTIONS

Table D-3 contains a list of NEON instructions, as well as a brief description of each instruction.

TABLE D-3: NEON Instructions

!

INSTRUCTION

DESCRIPTION

VABA

Absolute difference and Accumulate

VABD

Absolute difference

VABS

Absolute Value

VACGE

Absolute Compare Greater Than or Equal

VACGT

Absolute Compare Greater Than

VACLE

Absolute Compare Less Than or Equal

VACLT

Absolute Compare Less Than

VADD

Add

VADDHN

Add, Select High Half

VAND

Logical AND

VBIC

Bitwise Bit Clear

VBIF

Bitwise Insert if False

VBIT

Bitwise Insert if True

VBSL

Bitwise Select

VCEQ

Compare Equal

VCGE

Compare Greater Than or Equal

VCGT

Compare Greater Than

VCLE

Compare Less Than or Equal

VCLS

Count Leading Sign bits

VCLT

Compare Less Than

VCLZ

Count Leading Zeroes

VCNT

Count set bits

VCVT

Convert between different number formats

VDUP

Duplicate scalar to all lanes of vector

VEOR

Bitwise Exclusive OR

VEXT

Extract

VFMA

Fused Multiply and Accumulate

VFMS

Fused Multiply and Subtract

VHADD

Halving Add

VHSUB

Halving Subtract

VLD

Vector Load

VMAX

Maximum

VMIN

Minimum

VMLA

Multiply and Accumulate

VMLS

Multiply and Subtract

VMOV

Move

VMOVL

Move Long

VMOVN

Move Narrow

VMUL

Multiply

VMVN

Move Negative

VNEG

Negate

VORN

Bitwise OR NOT

VORR

Bitwise OR

VPADAL

Pairwise Add and Accumulate

VPADD

Pairwise Add

VPMAX

Pairwise Maximum

VPMIN

Pairwise Minimum

VQABS

Absolute Value, Saturate

VQADD

Add, Saturate

VQDMLAL

Saturating Double Multiply Accumulate

VQDMLSL

Saturating Double Multiply and Subtract

VQDMUL

Saturating Double Multiply

VQDMULH

Saturating Double Multiply returning High half

VQMOVN

Saturating Move

VQNEG

Negate, Saturate

VQRDMULH

Saturating Double Multiply returning High half

VQRSHL

Shift left, Round, Saturate

VQRSHR

Shift Right, Round, Saturate

VQSHL

Shift Left, Saturate

VQSHR

Shift Right, Saturate

VQSUB

Subtract, Saturate

VRADDH

Add, Select High Half, Round

VRECPE

Reciprocal Estimate

VRECPS

Reciprocal Step

VREV

Reverse Elements

VRHADD

Halving Add, Round

VRSHR

Shift Right and Round

VRSQRTE

Reciprocal Square Root Estimate

VRSQRTS

Reciprocal Square Root Step

VRSRA

Shift Right, Round and Accumulate

VRSUBH

Subtract, select High half, Round

VSHL

Shift Left

VSHR

Shift Right

VSLI

Shift Left and Insert

VSRA

Shift Right, Accumulate

VSRI

Shift Right and Insert

VST

Vector Store

VSUB

Subtract

VSUBH

Subtract, Select High half

VSWP

Swap Vectors

VTBL

Vector Table Lookup

VTBX

Vector Table Extension

VTRN

Vector Transpose

VTST

Test Bits

VUZP

Vector Unzip

VZIP

Vector Zip

INTRINSIC NAMING CONVENTIONS

Intrinsics provide an elegant way to write NEON instructions using C. NEON intrinsics are created using the following structure:

v[q][r]name[u][n][q][_lane][_n][_result]_type

where:

q indicates a saturating operation.

r indicates a rounding operation.

name is the descriptive name of the operation.

u indicates signed-to-unsigned saturation.

n indicates a narrowing operation.

q indicates an operation on 128-bit vectors.

_n indicates a scalar operand supplied as an argument.

_lane indicates a scalar operand taken from the lane of a vector.

result is the result type in short form.

For example, vmul_s16 multiplies two vectors of signed 16-bit values and is equivalent to VMUL.I16. Some examples in C include:

uint32x4_t vec128 = vld1q_u32(i); // Load 4 32-bit values

uint8x8_t vadd_u8 (uint8x8_t,

uint8x8_t); //Add two lanes

int8x16_t vaddq_s8 (int8x16_t, int8x16_t); //Saturating add two lanes