SSSE3
Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.
Close
Functionality
SSSE3 contains 16 new discrete instructions. Each instruction can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's materials refer to 32 new instructions. They include:[1]
- Twelve instructions that perform horizontal addition or subtraction operations.
- Six instructions that evaluate absolute values.
- Two instructions that perform multiply and add operations and speed up the evaluation of dot products.
- Two instructions that accelerate packed-integer multiply operations and produce integer values with scaling.
- Two instructions that perform a byte-wise, in-place shuffle according to the second shuffle control operand.
- Six instructions that negate packed integers in the destination operand if the signs of the corresponding element in the source operand is less than zero.
- Two instructions that align data from the composite of two operands.
CPUs with SSSE3
- AMD:
- "Cat" low-power processors
- Bobcat-based processors
- Jaguar-based processors and newer
- Puma-based processors and newer
- "Heavy Equipment" processors
- Bulldozer-based processors
- Piledriver-based processors
- Steamroller-based processors
- Excavator-based processors and newer
- Zen-based processors
- Zen+-based processors
- Zen2-based processors
- "Cat" low-power processors
- Intel:
- Xeon 5100 Series
- Xeon 5300 Series
- Xeon 5400 Series
- Xeon 3000 Series
- Core 2 Duo
- Core 2 Extreme
- Core 2 Quad
- Core i7
- Core i5
- Core i3
- Pentium Dual Core (if 64-bit capable; Allendale onwards)
- Celeron 4xx Sequence Conroe-L
- Celeron Dual Core E1200
- Celeron M 500 series
- Atom
- VIA:
New[1] instructions
In the table below, satsw(X) (read as 'saturate to signed word') takes a signed integer X, and converts it to −32768 if it is less than −32768, to +32767 if it is greater than 32767, and leaves it unchanged otherwise. As normal for the Intel architecture, bytes are 8 bits, words 16 bits, and dwords 32 bits; 'register' refers to an MMX or XMM vector register.
PSIGNB, PSIGNW, PSIGND | Packed Sign | Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative. |
---|---|---|
PABSB, PABSW, PABSD | Packed Absolute Value | Fill the elements of a register of bytes, words or dwords with the absolute values of the elements of another register |
PALIGNR | Packed Align Right | take two registers, concatenate their values, and pull out a register-length section from an offset given by an immediate value encoded in the instruction. |
PSHUFB | Packed Shuffle Bytes | takes registers of bytes A = [a0 a1 a2 ...] and B = [b0 b1 b2 ...] and replaces A with [ab0 ab1 ab2 ...]; except that it replaces the ith entry with 0 if the top bit of bi is set. |
PMULHRSW | Packed Multiply High with Round and Scale | treat the 16-bit words in registers A and B as signed 16-bit fixed-point numbers between −1.00000000 and +0.99996948... (e.g. 0x4000 is treated as +0.5 and 0xA000 as −0.75), and multiply them together with correct rounding. |
PMADDUBSW | Multiply and Add Packed Signed and Unsigned Bytes | Take the bytes in registers A and B, multiply them together, add pairs, signed-saturate and store. I.e. [a0 a1 a2 …] pmaddubsw [b0 b1 b2 …] = [satsw(a0b0+a1b1) satsw(a2b2+a3b3) …] |
PHSUBW, PHSUBD | Packed Horizontal Subtract (Words or Doublewords) | takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0−a1 a2−a3 … b0−b1 b2−b3 …] |
PHSUBSW | Packed Horizontal Subtract and Saturate Words | like PHSUBW, but outputs [satsw(a0−a1) satsw(a2−a3) … satsw(b0−b1) satsw(b2−b3) …] |
PHADDW, PHADDD | Packed Horizontal Add (Words or Doublewords) | takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0+a1 a2+a3 … b0+b1 b2+b3 …] |
PHADDSW | Packed Horizontal Add and Saturate Words | like PHADDW, but outputs [satsw(a0+a1) satsw(a2+a3) … satsw(b0+b1) satsw(b2+b3) …] |
References
- "2.9.5". Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF) (Technical report). Intel.com. 2016. pp. 92–93. Retrieved June 22, 2018.
External links
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.