CLMUL instruction set

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008[1] and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring $a_{0}a_{1}\ldots a_{63}$ represents the polynomial $a_{0}+a_{1}X+a_{2}X^{2}+\cdots +a_{63}X^{63}$ . The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2^k) than the traditional instruction set.[2]

One use of these instructions is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field GF(2^k) multiplication. Another application is the fast calculation of CRC values,[3] including those used to implement the LZ77 sliding window DEFLATE algorithm in zlib and pngcrush.[4]

ARMv8 also has a version of CLMUL. SPARC calls their version XMULX, for "XOR multiplication".

New instructions

The instruction computes the 128-bit carry-less product of two 64-bit values. The destination is a 128-bit XMM register. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. Mnemonics specifying specific values of the immediate operand are also defined:

Instruction	Opcode	Description
`PCLMULQDQ xmmreg,xmmrm,imm`	`[rmi: 66 0f 3a 44 /r ib]`	Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k).
`PCLMULLQLQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 00]`	Multiply the low halves of the two registers.
`PCLMULHQLQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 01]`	Multiply the high half of the destination register by the low half of the source register.
`PCLMULLQHQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 10]`	Multiply the low half of the destination register by the high half of the source register.
`PCLMULHQHQDQ xmmreg,xmmrm`	`[rm: 66 0f 3a 44 /r 11]`	Multiply the high halves of the two registers.

A EVEX vectorized version (VPCLMULQDQ) is seen in AVX-512.

CPUs with CLMUL instruction set

Intel
- Westmere processor (March 2010).
- Sandy Bridge processor
- Ivy Bridge processor
- Haswell processor
- Broadwell processor (with increased throughput and lower latency[5])
- Skylake (and later) processor
- Goldmont processor
AMD:
- Jaguar-based processors and newer [6]
- Puma-based processors and newer
- "Heavy Equipment" processors
  - Bulldozer-based processors [7]
  - Piledriver-based processors
  - Steamroller-based processors
  - Excavator-based processors and newer
- Zen processors
- Zen+ processors
- Zen2 (and later) processors

The presence of the CLMUL instruction set can be checked by testing one of the CPU feature bits.

References

"Intel Software Network". Intel. Archived from the original on 2008-04-07. Retrieved 2008-04-05.
Shay Gueron (2011-04-13). "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2". Intel.
"Fast CRC Computation for Generic Polynomials Using PCLMULQDQ" (PDF).
Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. Retrieved 2016-09-04.
Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3.
"Slide detailing improvements of Jaguar over Bobcat". AMD. Retrieved August 3, 2013.
Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. Retrieved 2011-03-11.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Intel Software Network". Intel. Archived from the original on 2008-04-07. Retrieved 2008-04-05.

[2] Shay Gueron (2011-04-13). "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2". Intel.

[3] "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ" (PDF).

[4] Vlad Krasnov (2015-07-08). "Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code". CloudFlare. Retrieved 2016-09-04.

[5] Johan De Gelas (2017-03-31). "The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads". Anandtech. p. 3.

[6] "Slide detailing improvements of Jaguar over Bobcat". AMD. Retrieved August 3, 2013.

[7] Dave Christie (6 May 2009). "Striking a balance". AMD Developer blogs. Archived from the original on 9 November 2013. Retrieved 2011-03-11.

AMD technology
Software	AMD Radeon Software AGESA AMDGPU
Platforms	Spider Dragon Horus
Technology	Cool'n'Quiet High Bandwidth Memory PowerNow! PowerPlay PowerTune Turbo Core ASTC AMD Wraith
Instructions	X86-64 3DNow! AVX XOP CVT16/F16C FMA FMA4 FMA3 BMI ABM BMI1 TBM SSE5 ASF AES

Intel technology
Platforms	Centrino Centrino 2 Viiv MID Tablet CULV Ultrabook Skulltrail NUC Galileo Edison Curie
Discontinued	Common Building Block MultiProcessor Specification Intel Communication Streaming Architecture Intel Inboard 386 Intel Play MMC-1 MMC-2
Current	Advanced Programmable Interrupt Controller CNVi Intel Turbo Boost vPro Intel Secure Key Intel Management Engine Active Management Technology AMT versions High-bandwidth Digital Content Protection High Definition Audio Hub Architecture Rapid Storage Technology Enhanced SpeedStep Serial Digital Video Out Host Embedded Controller Interface Hyper-threading Omni-Path Platform Environment Control Interface QuickPath Interconnect Platform Controller Hub System Management Bus Thunderbolt Ultra Path Interconnect
Upcoming	Silicon Photonics Link

Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	SuperH Thumb MIPS16e ASE RVC
Security and cryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are ~~struck through~~.

CLMUL instruction set

New instructions

CPUs with CLMUL instruction set

See also

References