SSSE3

Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.

Close

Functionality

SSSE3 contains 16 new discrete instructions. Each instruction can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's materials refer to 32 new instructions. They include:[1]

  • Twelve instructions that perform horizontal addition or subtraction operations.
  • Six instructions that evaluate absolute values.
  • Two instructions that perform multiply and add operations and speed up the evaluation of dot products.
  • Two instructions that accelerate packed-integer multiply operations and produce integer values with scaling.
  • Two instructions that perform a byte-wise, in-place shuffle according to the second shuffle control operand.
  • Six instructions that negate packed integers in the destination operand if the signs of the corresponding element in the source operand is less than zero.
  • Two instructions that align data from the composite of two operands.

CPUs with SSSE3

New[1] instructions

In the table below, satsw(X) (read as 'saturate to signed word') takes a signed integer X, and converts it to −32768 if it is less than −32768, to +32767 if it is greater than 32767, and leaves it unchanged otherwise. As normal for the Intel architecture, bytes are 8 bits, words 16 bits, and dwords 32 bits; 'register' refers to an MMX or XMM vector register.

PSIGNB, PSIGNW, PSIGND Packed Sign Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PABSB, PABSW, PABSD Packed Absolute Value Fill the elements of a register of bytes, words or dwords with the absolute values of the elements of another register
PALIGNR Packed Align Right take two registers, concatenate their values, and pull out a register-length section from an offset given by an immediate value encoded in the instruction.
PSHUFB Packed Shuffle Bytes takes registers of bytes A = [a0 a1 a2 ...] and B = [b0 b1 b2 ...] and replaces A with [ab0 ab1 ab2 ...]; except that it replaces the ith entry with 0 if the top bit of bi is set.
PMULHRSW Packed Multiply High with Round and Scale treat the 16-bit words in registers A and B as signed 16-bit fixed-point numbers between −1.00000000 and +0.99996948... (e.g. 0x4000 is treated as +0.5 and 0xA000 as −0.75), and multiply them together with correct rounding.
PMADDUBSW Multiply and Add Packed Signed and Unsigned Bytes Take the bytes in registers A and B, multiply them together, add pairs, signed-saturate and store. I.e. [a0 a1 a2 …] pmaddubsw [b0 b1 b2 …] = [satsw(a0b0+a1b1) satsw(a2b2+a3b3) …]
PHSUBW, PHSUBD Packed Horizontal Subtract (Words or Doublewords) takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0−a1 a2−a3 … b0−b1 b2−b3 …]
PHSUBSW Packed Horizontal Subtract and Saturate Words like PHSUBW, but outputs [satsw(a0−a1) satsw(a2−a3) … satsw(b0−b1) satsw(b2−b3) …]
PHADDW, PHADDD Packed Horizontal Add (Words or Doublewords) takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0+a1 a2+a3 … b0+b1 b2+b3 …]
PHADDSW Packed Horizontal Add and Saturate Words like PHADDW, but outputs [satsw(a0+a1) satsw(a2+a3) … satsw(b0+b1) satsw(b2+b3) …]

See also

References

  1. "2.9.5". Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF) (Technical report). Intel.com. 2016. pp. 92–93. Retrieved June 22, 2018.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.