SSSE3

Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.

Close

Functionality

SSSE3 contains 16 new discrete instructions. Each instruction can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's materials refer to 32 new instructions. They include:[1]

Twelve instructions that perform horizontal addition or subtraction operations.
Six instructions that evaluate absolute values.
Two instructions that perform multiply and add operations and speed up the evaluation of dot products.
Two instructions that accelerate packed-integer multiply operations and produce integer values with scaling.
Two instructions that perform a byte-wise, in-place shuffle according to the second shuffle control operand.
Six instructions that negate packed integers in the destination operand if the signs of the corresponding element in the source operand is less than zero.
Two instructions that align data from the composite of two operands.

CPUs with SSSE3

AMD:
- "Cat" low-power processors
  - Bobcat-based processors
  - Jaguar-based processors and newer
  - Puma-based processors and newer
- "Heavy Equipment" processors
  - Bulldozer-based processors
  - Piledriver-based processors
  - Steamroller-based processors
  - Excavator-based processors and newer
- Zen-based processors
- Zen+-based processors
- Zen2-based processors
Intel:
- Xeon 5100 Series
- Xeon 5300 Series
- Xeon 5400 Series
- Xeon 3000 Series
- Core 2 Duo
- Core 2 Extreme
- Core 2 Quad
- Core i7
- Core i5
- Core i3
- Pentium Dual Core (if 64-bit capable; Allendale onwards)
- Celeron 4xx Sequence Conroe-L
- Celeron Dual Core E1200
- Celeron M 500 series
- Atom
VIA:
- Nano

New[1] instructions

In the table below, satsw(X) (read as 'saturate to signed word') takes a signed integer X, and converts it to −32768 if it is less than −32768, to +32767 if it is greater than 32767, and leaves it unchanged otherwise. As normal for the Intel architecture, bytes are 8 bits, words 16 bits, and dwords 32 bits; 'register' refers to an MMX or XMM vector register.

PSIGNB, PSIGNW, PSIGND	Packed Sign	Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PABSB, PABSW, PABSD	Packed Absolute Value	Fill the elements of a register of bytes, words or dwords with the absolute values of the elements of another register
PALIGNR	Packed Align Right	take two registers, concatenate their values, and pull out a register-length section from an offset given by an immediate value encoded in the instruction.
PSHUFB	Packed Shuffle Bytes	takes registers of bytes A = [a₀ a₁ a₂ ...] and B = [b₀ b₁ b₂ ...] and replaces A with [a_b0 a_b1 a_b2 ...]; except that it replaces the ith entry with 0 if the top bit of b_i is set.
PMULHRSW	Packed Multiply High with Round and Scale	treat the 16-bit words in registers A and B as signed 16-bit fixed-point numbers between −1.00000000 and +0.99996948... (e.g. 0x4000 is treated as +0.5 and 0xA000 as −0.75), and multiply them together with correct rounding.
PMADDUBSW	Multiply and Add Packed Signed and Unsigned Bytes	Take the bytes in registers A and B, multiply them together, add pairs, signed-saturate and store. I.e. [a0 a1 a2 …] pmaddubsw [b0 b1 b2 …] = [satsw(a0b0+a1b1) satsw(a2b2+a3b3) …]
PHSUBW, PHSUBD	Packed Horizontal Subtract (Words or Doublewords)	takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0−a1 a2−a3 … b0−b1 b2−b3 …]
PHSUBSW	Packed Horizontal Subtract and Saturate Words	like PHSUBW, but outputs [satsw(a0−a1) satsw(a2−a3) … satsw(b0−b1) satsw(b2−b3) …]
PHADDW, PHADDD	Packed Horizontal Add (Words or Doublewords)	takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0+a1 a2+a3 … b0+b1 b2+b3 …]
PHADDSW	Packed Horizontal Add and Saturate Words	like PHADDW, but outputs [satsw(a0+a1) satsw(a2+a3) … satsw(b0+b1) satsw(b2+b3) …]

References

"2.9.5". Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF) (Technical report). Intel.com. 2016. pp. 92–93. Retrieved June 22, 2018.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] "2.9.5". Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF) (Technical report). Intel.com. 2016. pp. 92–93. Retrieved June 22, 2018.

Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	SuperH Thumb MIPS16e ASE RVC
Security and cryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are ~~struck through~~.