Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction

Executive Summary

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking. There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks. We present fast and efficient methods of computing CRC on Intel processors for the fixed (degree-32) iSCSI polynomial, using the CRC32 instruction present in the Intel® Core™ i5 processor 650.

Instead of computing CRC of the entire message with a traditional linear method, we use a faster method to split an arbitrary length buffer to a number of smaller fixed size segments, compute the CRC on these segments in parallel followed by a recombination step of computing the effective CRC using the partial CRCs of the segments.

Parallelized CRC computation is used to maximize the throughput of the CRC32 instruction. We show an efficient method for data buffers of arbitrary length.

The final recombination of CRCs adds an overhead and can be implemented with lookup tables on the Nehalem microarchitecture—we show how to do this with as few tables as possible while giving excellent overall performance on the range of sizes. The PCLMULQDQ instruction in the Westmere microarchitecture allows efficient recombination of CRCs without lookup tables. The various methods are thoroughly explained in this paper with real code examples.

The paper will enable customers to code and optimize their iSCSI CRC applications for maximum performance on Intel® processors. We use real optimized code examples in the paper.

Read the full Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction White Paper.

554KB

## 죄송합니다. 이 PDF는 다운로드 형식으로만 제공됩니다.

** **Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction

Executive Summary

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking. There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks. We present fast and efficient methods of computing CRC on Intel processors for the fixed (degree-32) iSCSI polynomial, using the CRC32 instruction present in the Intel® Core™ i5 processor 650.

Instead of computing CRC of the entire message with a traditional linear method, we use a faster method to split an arbitrary length buffer to a number of smaller fixed size segments, compute the CRC on these segments in parallel followed by a recombination step of computing the effective CRC using the partial CRCs of the segments.

Parallelized CRC computation is used to maximize the throughput of the CRC32 instruction. We show an efficient method for data buffers of arbitrary length.

The final recombination of CRCs adds an overhead and can be implemented with lookup tables on the Nehalem microarchitecture—we show how to do this with as few tables as possible while giving excellent overall performance on the range of sizes. The PCLMULQDQ instruction in the Westmere microarchitecture allows efficient recombination of CRCs without lookup tables. The various methods are thoroughly explained in this paper with real code examples.

The paper will enable customers to code and optimize their iSCSI CRC applications for maximum performance on Intel® processors. We use real optimized code examples in the paper.

Read the full Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction White Paper.